mathandy / svgpathtools

A collection of tools for manipulating and analyzing SVG Path objects and Bezier curves.
MIT License
532 stars 134 forks source link

Support reading of SVGs from strings and file-like objects #176

Closed FlyingSamson closed 2 years ago

FlyingSamson commented 2 years ago

I had the same problem as described in #138. Since his PR #139 after having received feedback from @tatarize seemed to no longer be under active development, I decided to give it a shot my self trying to take into account the suggestions by @tatarize.

I basically check in both the function svg2paths and the class Document if the passed object is a str. If so it is treated as a file location just as before. In every other case the object is assumed to be file-like (that is, having at least a read function) and is then passed directly into the respective XML parsers which already differentiate correctly in strings describing a file location on disc and file-like objects providing the data directly. This then supports files opened with the build-in open function as well as StringIO objects for pipelining svg objects between different packages.

Moreover, I implemented overloads for svg2paths and svg2paths2, namely svg_string2paths and svg_string2paths2 that take a str object, wrap it in a StringIO object and then call the already existing functions that were alternated to be able to handle this kind of object. A static factory method from_svg_string(str) was also added to the Document class to allow creating such an object from a str object. This also internal wraps the str object into a StringIO object.

I provide tests for the svg2paths functions, the changed ctor of Document and the new factory method, to test that they all work for file system paths, files, StringIO objects, and strings.

We might want to add some information for this new gained flexibility in the documentation. I didn't know, however, which file I would need to manipulate the ipython notebook or the markdown and whether the other one is autogenerated in some way from the other.

If you dislike any of my changes please feel free to make any changes to my proposed changes.

mathandy commented 2 years ago

No worries about the Codacy fail -- give me some time and I'll merge or come back with some notes.

tatarize commented 2 years ago

I can't really poke holes in it. It seems to address the issue. The only thing I saw was svg_file_obj.seek(0) is likely pointless since you just created the stringio and it should start from zero anyway. Though I just read over the changes a few times and might have missed something that better testing would have discovered.

FlyingSamson commented 2 years ago

Thank you for taking the time to review the code!

You are right of course. I guess I misinterpreted the documentation of StringIOs ctor assuming that with The stream is positioned at the start of the buffer they meant that the string given as initial content to the ctor will be placed at the start of the buffer. However, in retrospective, I no longer know why I thought this would make any sense.

Accordingly, I removed all usages of seek(0) in the code and the tests.

tatarize commented 2 years ago
class Document:
    def __init__(self, svg_file_name_or_file=None):

This is potentially a breaking change. While svgpathtools hasn't changed much it does get used a lot so changing anything about the already established api is problematic.

doc = Document(filepath="myfile.svg")

Would oddly enough suddenly break if this PR is merged. I'm not sure if anybody did this sort of kwarg call, but this is a minor break in backwards compatibility since they totally could do that, and somebody probably did.

Also, the from_svg_string method is a staticmethod but seems to cry out to be a classmethod.

And I think svgstr2paths() might be a better functional call and I don't think the 2 form is needed. I think there was only the 2 form of the other function because returning the svg attributes was a behavior change and would break backwards compatibility so the other form got added in there. But, I'm not exactly sure if that's the providence and I'd need to check the blame to know the actual history of that.

tatarize commented 2 years ago

And as a pro-tip the smaller you can make the changes the better. So if you reverted to calling these svg_filename_or_stream filepath and just bit the bullet on that naming, you'd end up able to reduce your changes to a very small shift in the parsing routine. And a couple added functions. Which is much easier to check through since to check a PR we sort of have to track every code path in our head and figure out what could go wrong and hope we did a good job. But, if the changes are minimal especially for something fairly entrenched like this project we it makes review significantly easier since there's very little that could have changed.

You'd have a small parsing change and a class method in document. And two completely optional functions in svg_to_paths. That change could be rubber stamped fairly easily, since it would be quite clear it would work.


There's also a reasonable possibility that you could differentiate the strings. If the string you're given contains something many different lines and brackets and doesn't end with '.svg' or something, maybe it's not a filename, maybe it's a string of svg contents? And maybe you could tell the difference. Afterall, a string of svg contents would actually parse when you tried to feed it to an xml parser whereas a filename would not be valid xml and would usually contain something that looks pathy.

If done correctly you could tell that:

doc = Document(filepath="myfile.svg")

and

doc = Document(u'''<?xml version="1.0" encoding="utf-8" ?>
                        <svg width="3.0cm" height="3.0cm" viewBox="0 0 100 100" xmlns="http://www.w3.org/2000/svg" 
                        xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:xlink="http://www.w3.org/1999/xlink">
                        <g display="none">
                        <line display="show" x1="0.0" x2="0.0" y1="0.0" y2="100"/>
                        </g>
                        </svg>''')

May be attempting to convey different bits of information. After all it's fairly rare for filenames to start with < but I think any xml string actually would, just as containing tabs and newlines seems a little bit suspect for a filename.

FlyingSamson commented 2 years ago

Ok, I now switched back to the old function parameters, changes the static method to class method (completely forgot that such a distinction existed in python), and renamed the method to svgstr2paths().

Regarding the check for svg string vs file location: I totally see your point that it should be possible to differentiate this two cases programmatically. However, I'm not sure at which point we can be sure that the given string is not a filename. For instance how about:

This will work in virtually 100% of the cases, but what if someone indeed decides to provide a file called <some-string-here>? Of course this would be quite strange, but nevertheless he would have no option to force the package to treat this as a file rather than a svg string (except of course opening the file himself) and because I believe that there will always be a svg/xml string that would also pass as valid file name (at least in the Unix world) I'm not sure if this is the ideal solution.

An alternative might be to add a boolean parameter to the function, e.g., input_is_svg_string, which is False by default and would emulate the behavior of the svgstr2path() function if it is set to True. But I do not like that solution very much, either.

tatarize commented 2 years ago

Typically and fairly reasonably you need to make a decision. I checked and apparently <a/> may actually be able to be a filename in linux and that is valid xml. So you cannot be assured you are being given an xml document or a path. But, in many other OSes the < value is absolutely disallowed and you'd very very rarely ever see such a thing. You could perhaps feed it through an xml parser but apparently some valid namespaces could also be structually valid xml in a fairly extreme case. Though really anything that starts with < should be a svg_string. You could get fancier and sometimes XML starts with a comment thing that defines something about the xml but all of them open with <.

This PR though does fix the error intended. And the non-filename string type could be added in a different PR. I think most everybody avoids <> in filenames and I'd be surprised if anything starting with < was a file. Though you could always try to open it as a file and if throws any of the errors you'd expect you could also try to parse the string as xml and then throw your own error if it neither loaded as a file or parsed as xml.

mathandy commented 2 years ago

Thanks @FlyingSamson and @tatarize, this was a well-done pull-request. I made some changes (to add/retain support for pathlib filepaths) and merged it in.
@tatarize really good work here, your input was really helpful. @FlyingSamson awesome job with this contribution and the collaboration with @tatarize. Thanks to you both for the contribution.