mm1ke / gentoo-scripts

Checks the Gentoo Portage tree for various problems
https://gentooqa.levelnine.at/
15 stars 2 forks source link

false positive srctest #13

Closed jonasstein closed 6 years ago

jonasstein commented 7 years ago

I think this is a false positive:

http://gentoo.levelnine.at/srctest/sort-by-maintainer/tex_at_g.o.txt
dev-tex/latex2html|http://mirrors.ctan.org/support/latex2html/latex2html-2017.2.tar.gz|tex@gentoo.org:
mm1ke commented 7 years ago

Hmm, i just checked the package with srctest and now it's correct.. (it's online) I don't think it's a real false positive in case of being wrongly matches as offline, because maybe the package was really offline at the time of testing, even though that is unlikely because the other packages from that site were online. Maybe i just had some troubles with my internet connection that time, or it's because of the use of parallel.. At least the script seem to work correctly.. I will keep an eye on this..

However, don't forget srctest (and actually every other script too) just make a snapshot-test of every package, which doesn't represents the status quo. :)

mm1ke commented 7 years ago

OK, forget what i just say.. it's clearly a parallel problem: The full list show's the the file has two status:

available|dev-tex/latex2html;http://mirrors.ctan.org/support/latex2html/latex2html-2017.2.tar.gz;tex@gentoo.org:
not_available|dev-tex/latex2html|http://mirrors.ctan.org/support/latex2html/latex2html-2017.2.tar.gz|tex@gentoo.org:

Hmm, that will be interesting to fix ..

mm1ke commented 7 years ago

Ok, i found a Bug in my srctest script. Packages available didn't use the new delimiter yet. Even though i'm not sure if this caused this problem, i will wait until tomorrow because then it's easier to check the result.

On a similar note, I've also checked the wwwtest result. If there would be a similar problem (a website with 2 stati) the output of curl -s http://gentoo.levelnine.at/wwwtest/full.txt|cut -d'|' -f4|sort -u|wc and curl -s http://gentoo.levelnine.at/wwwtest/full.txt|cut -d'|' -f1,4|sort -u|wc would be different. In fact, at the moment it is different, however only by one line, which is because there are 5 lines which doesn't use the delimiter (search for "200 http"). I don't know yet why that happend, however, i don't think it's related to parallel, as i think it would happen more often then.

mm1ke commented 7 years ago

OK, today we got a correct result and i could check it with the above command. Gladly the difference is just 5 packages who have different stati. Considering i'm checking about 22000 packages each run i think this is negligible and probably will eliminated anyway because i found a different "problem".

The more interesting aspect of my checks is the output of the above commands once with sort -u and once with only sort The output shows that i have nearly 10000 duplicates, which as for now get checked each time as well! Unlike wwwtest, which first checks a tmpfile for already checked homepages, i didn't implemented that in srctest. The reason behind was that i though that every new package(version) also have a new package to download. What i haven't though about were revisions of packages, patchfiles which get applied on multiple version and probably other cases were files get used multiple times.

After all, the good news is, this will be fixed tonight. :) And the even better news is, this will improve duration time a lot. 10k packages less to check will clearly speed up the script again. Looking forward how much faster it will be :)

jonasstein commented 7 years ago

This is another false positive (probably fixed tomorrow, or a different reason)

http://gentoo.levelnine.at/srctest/sort-by-maintainer/lxqt_at_g.o.txt
x11-misc/obconf-qt|https://dev.gentoo.org/~jauhien/distfiles/obconf-qt-0.9.0_p20150729.tar.gz|lxqt@gentoo.org:

The package version is missing there too. x11-misc/obconf-qt-0.9.0_p20150729

mm1ke commented 7 years ago

Hi,

The package version will be included now too. I just started the script again as it's gone wild over the night (somehow the full listening grow to about 5G!). I'm not sure whats the reason for that, but i made some minor changes. We will see what happen next.

mm1ke commented 7 years ago

looks good now.. :)

jonasstein commented 7 years ago

not yet, http://gentoo.levelnine.at/wwwtest/sort-by-maintainer/sebastian%2Bdev_at_ramacher.at.txt I can see https://pwmt.org/projects/zathura/, but 000 means server not available, right?

mm1ke commented 7 years ago

Not necessarily, 000 means the script didn't got a response (http code) from the website. The timeout for this is 10 seconds.

To be clear here 000 isn't a http code, it's just placeholder for no status.

jonasstein commented 7 years ago

in http://gentoo.levelnine.at/srctest/sort-by-maintainer/proxy-maint_at_g.o.txt

sys-process/minit|minit-0.10|http://dl.fefe.de/minit-0.10.tar.bz2|aw-gentoo@instandbesetzt.net:proxy-maint@gentoo.org: works for me. Could we have the status number back in the log? Perhaps it helps to track false positives better and find a solution.

jonasstein commented 7 years ago

more false positives: https://bugs.gentoo.org/637012

http://www.skarnet.org in https://gentoo.levelnine.at/full-sort-by-maintainer/williamh_at_g.o.txt are false positives

NP had an idea: It could be that we trigger a server limitation by our parallel fetches. It would be interesting to fetch in a mixed way, so that we do not fetch 8 times from one server in parallel.

mm1ke commented 6 years ago

Hi,

After looking for your false positives i finally found some problems with the script. :) To explain what went wrong i first gonna explain how the script works. srctest uses wget's spider functionality to get the HTTP Status code from each file it's checking. If it finds a particular Text (which was: HTTP/1.1 200 OK) it decides the package must be online. However, your false positve returns a slighly different Statuscode. So far i got: HTTP/1.0 200 OK HTTP/1.1 200 Coming Up Since i didn't check for those it was marked as unavailable.

I've now changed the script and only check for the Text: 'Remote file exists.' which also get printed by wget's spider. This should fix a whole bunch of false positives.

mm1ke commented 6 years ago

Nope, this only fixes srctest. wwwtest works differently.

jonasstein commented 6 years ago

great, it seems to work for srctest and wwwtest now. I can not find any of the old false positives. Thank you.

mm1ke commented 6 years ago

no problem, please let me know if you find others :)