Closed boobayayo closed 1 year ago
Thank you for this report. I am sorry for the delay. I am back from my vacation and now again trying to catch up on github issues.
I have made some improvements to some URL parsing login the past couple of months, so I guess, absent other changes in this downloader, this is what has broke you since 2022-04-22. I think I fixed some URL Class ordering, so that more complicated URL classes would be matched before simpler ones. As an example:
https://site.com/123456?display=large
is considered more complicated than
https://site.com/123456
Maybe this affects you, maybe it is something else and I am just forgetting what I have changed. I note that the 2 URL is basically the 1 URL but with some parameters, so maybe something is getting confused here. It might be worth checking your 'manage url class links' dialog, just to make sure your URL classes are linked up to the parsers you expect and there aren't any 'instagram parser test (do not use)' spare objects that somehow got linked up without you realising. Also worth pasting these URLs into the test area in 'manage url classes' just to make sure they are being linked to the URL classes you think.
The second question is the URL append in part 4. Your idea about it being /assets/picture.png sounds correct, but I'm not sure why the URL would be seen this way--it doesn't seem to have any weird characters or anything that might throw the parser off. But I do note that it is added with URL-encoded characters. Rather than https://, it is adding:
https%3A%2F%2Fscontent-dus1-1.cdninstagram.com%2Fv%2Ft51.2885-15%2F287715666_312179114333180_6690026352780841598_n.jpg%3Fstp%3Ddst-jpg_e35&cb=9ad74b5e-88ad7ee8&_nc_ht=scontent-dus1-1.cdninstagram.com&_nc_cat=108&_nc_ohc=DTuw5_uDzEIAX-PLzEP&edm=ANmP7GQBAAAA&ccb=7-5&ig_cache_key=Mjg2MDcyMDk0NjUxNDQxODgyNQ%3D%3D.2-ccb7-5&oh=00_AT_p8Exu4hxnUY0uNvyd-P6NxlgqCk0wpsBaw8HL_MR3oQ&oe=62AC740A&_nc_sid=276363
With %3A and %2Fs going on. So it feels like something is being encoded (or maybe the parsed URL is not decoded?) before it is hits the last step of the Content Parser. Not sure why the manual parsing would work ok though. :/
Do you think you could bundle all these objects into a png or some JSON and post them here, so I can test them on my end? If you haven't done it before, hit up network->downloaders->export downloaders and then add all your instagram objects to it, and then export to png and post here.
Thank you for looking into this.
My url class links look correct, I've reapplied them just to be sure, but that has not changed anything. (the regular story post url classes (first and third entry) are not linked since I only add them as associated urls)
The "manage url classes" dialog also reliably identifies all relevant urls properly.
Also for completeness' sake, v490 did not change any of this behavior.
Here's all the Instagram story stuff as png (example links are nsfw btw):
edit: Oh and the regular Instagram post parser. (working fine)(needs login cookies)
fyi I worked around this by percent-encoding the url parameter in my userscript and then decoding it in the url parser.
Hydrus version
488d
Operating system
Windows 11
Install method
Third party (AUR, Docker, Chocolatey, etc. Specify in comments)
Install and OS comments
scoop package manager same behavior on regular extract
Bug description and reproduction
Background
Since parsing of instagram stories does not work in hydrus directly (js stuff I assume) I've built a userscript that generates a url in-browser, containing some metadata and the media download url (image/video) as url parameters (
2.
below)I then have an url class matching these urls including the custom parameters. This url class is linked to a parser which parses the metadata and media url using regex replacement on the
url context variable
The media url (3.
) is then pursued, recognized as a file, downloaded and gets the parsed tags attached to it.This has all worked perfectly before. (last used on 2022-04-22)
Problem
Now, when using
url import
Hydrus seems to append the parsed media url to the base instagram story url (4.
). This behavior suggests to me that it thinks the media url not a valid full url (like when parsing a relative path/assets/picture.png
). When testing the parser manually in the parser configuration screen this does not happen (5.
)More info
For testing I have modified my parser to output a heavily simplified url and even when removing the entire path,
url import
still has the same error (6.
and7.
)Another parser that outputs similar cdninstagram urls works flawlessly (regular instagram post parser).
Sorry for the wall of text, I might just be missing something obvious here, but I can't figure it out.
Log output