tectonic-typesetting / tectonic

A modernized, complete, self-contained TeX/LaTeX engine, powered by XeTeX and TeXLive.
https://tectonic-typesetting.github.io/
Other
3.99k stars 162 forks source link

A more robust manifest format #74

Open rekka opened 7 years ago

rekka commented 7 years ago

Currently the manifest format for the local cache is stored in a space-separated format without any escaping. This leads to invalid entries being inserted if for instance files with spaces are requested:

\input{"with space.tex"}\bye

produces entries

with space.tex 0 -
with space.tex.tex 0 -

It is not really a serious issue at the moment, but a future-proofing of the format is desirable.

I believe that one appropriate solution would be to use the csv crate, which also allows for space-separated entries. So it should be backwards compatible with the current format. It also offers easy parsing using serde. Moreover, it will most likely hit version 1 soon.

I am happy to implement this.

pkgw commented 7 years ago

Yes, it would be good to tighten things up here.

I might prefer to stick with the current format but just reverse the ordering: size, SHA256, filename. That way you can just split on the first two spaces and take the rest as a lump.

Although I guess we're still fragile if anyone uses a filename containing a newline ...

rekka commented 7 years ago

That is actually a pretty good solution too. Now files with newlines...

alexander-bauer commented 7 years ago

I was just browsing through looking at this, and poked around until I found the relevant parts of the code. It looks like src/io/local_cache:176 is where it gets written; I'm not sure where it gets read from the manifest.

rekka commented 7 years ago

It is a bit above that, in the new method.

alexander-bauer commented 7 years ago

@rekka I can go ahead and make a PR for this, but I'd also hate to duplicate work, if you'd like to

rekka commented 7 years ago

I'm not currently working on this, but I think we should reach a consensus with @pkgw on what the best approach here is.

I am not leaning towards a simple format as suggested by @pkgw with: size SHA256 filename, and completely disallowing files with new lines to enter the manifest. This can be easily parsed by str::splitn method.

pkgw commented 7 years ago

@rekka Do you perhaps mean "now" and not "not" in your most recent comment?

I think this is a good change but I am a little bit concerned about how to transition from an old manifest format to a new one. I guess we can just munge a version code into the file name, which seems reasonably future-proof if we want to make any other changes to the format.

rekka commented 7 years ago

Ah, sorry about that, I meant "not", not working on this.

A version code is a good option. I suppose you mean that tectonic would just discard an outdated manifest and rebuild the cache from scratch. That's simple.

pkgw commented 7 years ago

Yes, re-downloading files is a bit annoying but it seems like the lowest-pain option. I don't want people to have to be told to manually delete their cache directories if I can avoid it.

alexander-bauer commented 7 years ago

An option that's a little less rough on the UX would be to migrate from the older manifest version automatically, but I suspect that maintaining the code for that (and all future manifest improvements) would be daunting.

Perhaps the UX problem could be offset (if it needs to be at all) by printing an explanatory message when creating the new manifest if an old one is already present?