Closed FlyingSamson closed 2 years ago
No worries about the Codacy fail -- give me some time and I'll merge or come back with some notes.
I can't really poke holes in it. It seems to address the issue. The only thing I saw was svg_file_obj.seek(0)
is likely pointless since you just created the stringio and it should start from zero anyway. Though I just read over the changes a few times and might have missed something that better testing would have discovered.
Thank you for taking the time to review the code!
You are right of course. I guess I misinterpreted the documentation of StringIOs ctor assuming that with The stream is positioned at the start of the buffer
they meant that the string given as initial content to the ctor will be placed at the start of the buffer. However, in retrospective, I no longer know why I thought this would make any sense.
Accordingly, I removed all usages of seek(0)
in the code and the tests.
class Document:
def __init__(self, svg_file_name_or_file=None):
This is potentially a breaking change. While svgpathtools hasn't changed much it does get used a lot so changing anything about the already established api is problematic.
doc = Document(filepath="myfile.svg")
Would oddly enough suddenly break if this PR is merged. I'm not sure if anybody did this sort of kwarg call, but this is a minor break in backwards compatibility since they totally could do that, and somebody probably did.
Also, the from_svg_string
method is a staticmethod but seems to cry out to be a classmethod.
And I think svgstr2paths()
might be a better functional call and I don't think the 2 form is needed. I think there was only the 2 form of the other function because returning the svg attributes was a behavior change and would break backwards compatibility so the other form got added in there. But, I'm not exactly sure if that's the providence and I'd need to check the blame to know the actual history of that.
And as a pro-tip the smaller you can make the changes the better. So if you reverted to calling these svg_filename_or_stream filepath and just bit the bullet on that naming, you'd end up able to reduce your changes to a very small shift in the parsing routine. And a couple added functions. Which is much easier to check through since to check a PR we sort of have to track every code path in our head and figure out what could go wrong and hope we did a good job. But, if the changes are minimal especially for something fairly entrenched like this project we it makes review significantly easier since there's very little that could have changed.
You'd have a small parsing change and a class method in document. And two completely optional functions in svg_to_paths. That change could be rubber stamped fairly easily, since it would be quite clear it would work.
There's also a reasonable possibility that you could differentiate the strings. If the string you're given contains something many different lines and brackets and doesn't end with '.svg' or something, maybe it's not a filename, maybe it's a string of svg contents? And maybe you could tell the difference. Afterall, a string of svg contents would actually parse when you tried to feed it to an xml parser whereas a filename would not be valid xml and would usually contain something that looks pathy.
If done correctly you could tell that:
doc = Document(filepath="myfile.svg")
and
doc = Document(u'''<?xml version="1.0" encoding="utf-8" ?>
<svg width="3.0cm" height="3.0cm" viewBox="0 0 100 100" xmlns="http://www.w3.org/2000/svg"
xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:xlink="http://www.w3.org/1999/xlink">
<g display="none">
<line display="show" x1="0.0" x2="0.0" y1="0.0" y2="100"/>
</g>
</svg>''')
May be attempting to convey different bits of information. After all it's fairly rare for filenames to start with <
but I think any xml string actually would, just as containing tabs and newlines seems a little bit suspect for a filename.
Ok, I now switched back to the old function parameters, changes the static method to class method (completely forgot that such a distinction existed in python), and renamed the method to svgstr2paths()
.
Regarding the check for svg string vs file location: I totally see your point that it should be possible to differentiate this two cases programmatically. However, I'm not sure at which point we can be sure that the given string is not a filename. For instance how about:
<
, the last non-space character is a >
(i.e., it matches the regex ^\s*<
and >\s*$
).*<svg.*</svg.*
(or, probably a little faster, has a match for <svg
and </svg
)This will work in virtually 100% of the cases, but what if someone indeed decides to provide a file called <some-string-here>
? Of course this would be quite strange, but nevertheless he would have no option to force the package to treat this as a file rather than a svg string (except of course opening the file himself) and because I believe that there will always be a svg/xml string that would also pass as valid file name (at least in the Unix world) I'm not sure if this is the ideal solution.
An alternative might be to add a boolean parameter to the function, e.g., input_is_svg_string
, which is False by default and would emulate the behavior of the svgstr2path()
function if it is set to True. But I do not like that solution very much, either.
Typically and fairly reasonably you need to make a decision. I checked and apparently <a/>
may actually be able to be a filename in linux and that is valid xml. So you cannot be assured you are being given an xml document or a path. But, in many other OSes the < value is absolutely disallowed and you'd very very rarely ever see such a thing. You could perhaps feed it through an xml parser but apparently some valid namespaces could also be structually valid xml in a fairly extreme case. Though really anything that starts with <
should be a svg_string. You could get fancier and sometimes XML starts with a comment thing that defines something about the xml but all of them open with <
.
This PR though does fix the error intended. And the non-filename string type could be added in a different PR. I think most everybody avoids <> in filenames and I'd be surprised if anything starting with <
was a file. Though you could always try to open it as a file and if throws any of the errors you'd expect you could also try to parse the string as xml and then throw your own error if it neither loaded as a file or parsed as xml.
Thanks @FlyingSamson and @tatarize, this was a well-done pull-request. I made some changes (to add/retain support for pathlib
filepaths) and merged it in.
@tatarize really good work here, your input was really helpful. @FlyingSamson awesome job with this contribution and the collaboration with @tatarize.
Thanks to you both for the contribution.
I had the same problem as described in #138. Since his PR #139 after having received feedback from @tatarize seemed to no longer be under active development, I decided to give it a shot my self trying to take into account the suggestions by @tatarize.
I basically check in both the function
svg2paths
and the classDocument
if the passed object is astr
. If so it is treated as a file location just as before. In every other case the object is assumed to be file-like (that is, having at least aread
function) and is then passed directly into the respective XML parsers which already differentiate correctly in strings describing a file location on disc and file-like objects providing the data directly. This then supports files opened with the build-inopen
function as well asStringIO
objects for pipelining svg objects between different packages.Moreover, I implemented overloads for
svg2paths
andsvg2paths2
, namelysvg_string2paths
andsvg_string2paths2
that take astr
object, wrap it in aStringIO
object and then call the already existing functions that were alternated to be able to handle this kind of object. A static factory methodfrom_svg_string(str)
was also added to theDocument
class to allow creating such an object from astr
object. This also internal wraps thestr
object into aStringIO
object.I provide tests for the
svg2paths
functions, the changed ctor ofDocument
and the new factory method, to test that they all work for file system paths, files, StringIO objects, and strings.We might want to add some information for this new gained flexibility in the documentation. I didn't know, however, which file I would need to manipulate the ipython notebook or the markdown and whether the other one is autogenerated in some way from the other.
If you dislike any of my changes please feel free to make any changes to my proposed changes.