Open rekka opened 7 years ago
Yes, it would be good to tighten things up here.
I might prefer to stick with the current format but just reverse the ordering: size, SHA256, filename. That way you can just split on the first two spaces and take the rest as a lump.
Although I guess we're still fragile if anyone uses a filename containing a newline ...
That is actually a pretty good solution too. Now files with newlines...
I was just browsing through looking at this, and poked around until I found the relevant parts of the code. It looks like src/io/local_cache:176
is where it gets written; I'm not sure where it gets read from the manifest.
It is a bit above that, in the new
method.
@rekka I can go ahead and make a PR for this, but I'd also hate to duplicate work, if you'd like to
I'm not currently working on this, but I think we should reach a consensus with @pkgw on what the best approach here is.
I am not leaning towards a simple format as suggested by @pkgw with: size SHA256 filename, and completely disallowing files with new lines to enter the manifest. This can be easily parsed by str::splitn
method.
@rekka Do you perhaps mean "now" and not "not" in your most recent comment?
I think this is a good change but I am a little bit concerned about how to transition from an old manifest format to a new one. I guess we can just munge a version code into the file name, which seems reasonably future-proof if we want to make any other changes to the format.
Ah, sorry about that, I meant "not", not working on this.
A version code is a good option. I suppose you mean that tectonic would just discard an outdated manifest and rebuild the cache from scratch. That's simple.
Yes, re-downloading files is a bit annoying but it seems like the lowest-pain option. I don't want people to have to be told to manually delete their cache directories if I can avoid it.
An option that's a little less rough on the UX would be to migrate from the older manifest version automatically, but I suspect that maintaining the code for that (and all future manifest improvements) would be daunting.
Perhaps the UX problem could be offset (if it needs to be at all) by printing an explanatory message when creating the new manifest if an old one is already present?
Currently the manifest format for the local cache is stored in a space-separated format without any escaping. This leads to invalid entries being inserted if for instance files with spaces are requested:
produces entries
It is not really a serious issue at the moment, but a future-proofing of the format is desirable.
I believe that one appropriate solution would be to use the csv crate, which also allows for space-separated entries. So it should be backwards compatible with the current format. It also offers easy parsing using
serde
. Moreover, it will most likely hit version 1 soon.I am happy to implement this.