Closed whikloj closed 3 years ago
Merging #18 into main will increase coverage by
0.45%
. The diff coverage is79.54%
.
@@ Coverage Diff @@
## main #18 +/- ##
============================================
+ Coverage 78.35% 78.81% +0.45%
- Complexity 456 461 +5
============================================
Files 7 7
Lines 1266 1284 +18
============================================
+ Hits 992 1012 +20
+ Misses 274 272 -2
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
src/Bag.php | 79.37% <79.54%> (+0.73%) |
276.00 <1.00> (+5.00) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 667ae0b...64c88da. Read the comment docs.
Dear @whikloj,
the RFC mentions
It is RECOMMENDED that lines not exceed 79 characters in length. Long values MAY be continued onto the next line by inserting a LF, CR, or CRLF, and then indenting the next line with one or more linear white space characters (spaces or tabs). Except for linebreaks, such padding does not form part of the value.
(https://tools.ietf.org/html/rfc8493#section-2.2.2)
Best regards, JS
I tried your changes with a really long value (3 lines text with some colons between): creating and even reading the info bag was successful!
Hi @jens-st, I don't understand your comment. Could you help explain what you are trying to direct my attention to?
@henning-gerhardt could you give this one more test. I made a couple changes to support reading bags with newlines in the bag-info.txt
tags without removing those newlines. It should all be good, but I'd appreciate a once over and then I'll cut a point release.
I checked your latest changes and I'm unsure. If I put a very long value into a bag info tag, which is wrapped inside bag-info.txt
into multiple lines, then I would assume on reading this bag back that I'm getting this bag info tag without any formation and / or added new lines back.
Okay that is a fair point. To properly wrap lines on word break before we hit 79 characters makes the actual line length dependent on the word length. So it would be hard to tell if the newline was an auto-wrapping or a deliberate carriage return.
So maybe the trade-off is we leave your files on disk alone when reading, but if you execute $bag->update()
then we are reformatting your lines. Which is unfortunate but I don't really see away around it right now. I'll revert those changes.
Actually, I may have talked myself out of this. The section of the RFC that @jens-st referenced above says
...Except for linebreaks, such padding does not form part of the value.
Which I read to mean that linebreaks ARE part of the value. So maybe instead of removing them by default I should be storing them, but providing a way to retrieve either the original value (with line breaks) or a clean value (without line breaks). Thoughts?
I will discuss this with my teammates (one is maintaining a Perl based BagIt implementation) on Monday or Tuesday. As I'm only creating the bag but don't read it later I have only limited view on this.
Edit: the bag info read method on the Perl implementation is located here: https://metacpan.org/release/Archive-BagIt/source/lib/Archive/BagIt/Base.pm#L657 ... but I have only a limited knowledge in Perl.
Ok, we discussed this: adding line breaks and providing them on read is ok. But added padding white spaces must be removed on read / providing the bag info tag value.
Sorry for the delay on this @henning-gerhardt. So I've gone with lines over 70 characters are considered to have auto-wrapped. So I remove the newline characters. Lines under that, I keep the newline.
I'm looking at storing all the newlines and providing a way to access it with or without. So the data is pure but you can access it as a single string. But that will come later.
Let me know if this still works for you.
Sorry for my late reply but I was in vacation. Changes looks good for my use case.
You never have to apologize for taking vacation 😄
Resolves: #17
Reverse the tests to assume lines starting with at least 2 spaces or a tab character are the continuation of the previous line. Otherwise start checking for a new tag.
@henning-gerhardt does this sound reasonable?