Closed bfabio closed 2 years ago
pcvalidator -export
expands the URLs only if -remote-base-url
is passed. This is documented in the source but not to the user.
@sebbalex is there a chance the crawler could run it with no or empty RemoteBaseURL
?
@sebbalex is there a chance the crawler could run it with no or empty
RemoteBaseURL
?
RemoteBaseURL is needed to enforce absolute and relative url validation:
If left empty, absolute URLs will not be validated and no remote validation of files with relative paths will be performed.
furthermore there was no evidence of this in the past and since no changes were made in our codebase this confuses me.
RemoteBaseURL is needed to enforce absolute and relative url validation:
Sorry I wasn't clear. I was trying to say that the crawler being run with an empty RemoteBaseURL for some exotic reasons could explain what we are seeing. I'm just as puzzled as you. :thinking:
In latest run I noticed about this timeout problems, I think this is related to URL expand issue we got here.
time="2020-09-24T08:35:11Z" level=error msg="Error parsing publiccode.yml: logo: HTTP GET failed for https://raw.githubusercontent.com/AgID/rndt-joomla-template/master/documentation/images/logo-rndt.png: Get https://raw.githubusercontent.com/AgID/rndt-joomla-template/master/documentation/images/logo-rndt.png: dial tcp 151.101.36.133:443: i/o timeout"
time="2020-09-24T08:35:12Z" level=error msg="Error parsing publiccode.yml: logo: HTTP GET failed for https://raw.githubusercontent.com/AgID/rndt-catalogue/master/documentation/images/logo-rndt.png: Get https://raw.githubusercontent.com/AgID/rndt-catalogue/master/documentation/images/logo-rndt.png: dial tcp 151.101.36.133:443: i/o timeout"
time="2020-09-24T08:35:13Z" level=error msg="Error parsing publiccode.yml: logo: HTTP GET failed for https://raw.githubusercontent.com/italia/18app/master/src/Italia.DiciottoApp.iOS/Assets.xcassets/AppIcon.appiconset/Icon120.png: Get https://raw.githubusercontent.com/italia/18app/master/src/Italia.DiciottoApp.iOS/Assets.xcassets/AppIcon.appiconset/Icon120.png: dial tcp 151.101.36.133:443: i/o timeout"
time="2020-09-24T08:35:13Z" level=error msg="Error parsing publiccode.yml: description/it/screenshots: HTTP GET failed for https://raw.githubusercontent.com/consiglionazionaledellericerche/cool-jconon/master/docs/screenshot/responsive_it.png: Get https://raw.githubusercontent.com/consiglionazionaledellericerche/cool-jconon/master/docs/screenshot/responsive_it.png: dial tcp 151.101.36.133:443: i/o timeout\ndescription/en/screenshots: HTTP GET failed for https://raw.githubusercontent.com/consiglionazionaledellericerche/cool-jconon/master/docs/screenshot/home_en.png: Get https://raw.githubusercontent.com/consiglionazionaledellericerche/cool-jconon/master/docs/screenshot/home_en.png: dial tcp 151.101.36.133:443: i/o timeout"
time="2020-09-24T08:35:14Z" level=error msg="Error parsing publiccode.yml: description/it/screenshots: HTTP GET failed for https://raw.githubusercontent.com/vvfosprojects/sovvf/master/doc/images/dashboard.jpg: Get https://raw.githubusercontent.com/vvfosprojects/sovvf/master/doc/images/dashboard.jpg: dial tcp 151.101.36.133:443: i/o timeout"
time="2020-09-24T08:35:14Z" level=error msg="Error parsing publiccode.yml: description/it/screenshots: HTTP GET failed for https://raw.githubusercontent.com/IstitutoCentraleCatalogoUnicoBiblio/Nuovo-Opac-di-Polo-SBN/master/screenshots/nuovo_opac.png: Get https://raw.githubusercontent.com/IstitutoCentraleCatalogoUnicoBiblio/Nuovo-Opac-di-Polo-SBN/master/screenshots/nuovo_opac.png: dial tcp 151.101.36.133:443: i/o timeout"
Keeping this open though, because the root issue is not resolved.
189 dramatically decreases the frequency of this happening.
Keeping this open though, because the root issue is not resolved.
We could consider that root cause was the amount of concurrency process and close this, wdyt @bfabio ?
@sebbalex I'm not convinced, there must be something wrong in the code that doesn't handle git failures correctly and still resolves the URL as relative. Most (all?) of the failures where caused by concurrency, but the crawler should have stopped processing the repo as soon as they happened.
This doesn't apply anymore.
After #302 the crawler doesn't touch publiccode.yml's contents, APIs consumers are now in charge of doing the expansion, if they need it.
When the path in
logo
is relative, the crawler is supposed to expand it to the full URL and export the normalized file, but sometimes it doesn't do it:today:
August 16th: