Closed ghost closed 11 months ago
Which list are you seeing this?
Which list are you seeing this?
That is not important. I need to see which domains are considered non-domains. (my domain list, which i import, contains only a-z 0-9 and dot and hyphen)
We cannot help you if you're not helping us to help you. We cannot reproduce the issue without knowing which list you're seeing issues with. This is not going to work.
These are the regex patterns we use to define valid domains:
I guess you could manually parse your list to see if any lines don't fit those patterns.
These are the regex patterns we use to define valid domains:
I guess you could manually parse your list to see if any lines don't fit those patterns.
Hi PromoFaux, thank you very much for the info. I will look into it.
But the question was "I need to see which domains are considered non-domains". So, it has nothing to do which don't fit those patterns. I have 151 non-domains that don't fit those patterns. Only i don't see the 151 or a part of them on my screen.
Usually the gravity output would show a sample of 5 of them, but it seems there might be something "special" about your list that is preventing it from showing anything other than the first empty string value.
This is why it would be helpful to have visibility of the list, as we could further analyse/troubleshoot/debug
It's also worth noting that gravity shows unique non-domains. If your list has 151 empty "domains" then the seen output is expected.
I think that my non-domains that need to go into invalid_domains_list[i] array not listed because the non-domains are not false-positive-regex. (line 215 of gravity-parseList.c)
here are my files hosts3.csv hosts2.csv hosts1.csv
False positives are suppressed, but there are just a few items considered false positives.
This is the list of false positives:
Do you have any of these entries?
host3.cvs
is a defect file. It contains some binary data in those lines
1126341:direktpaket.com
1126342:donzidirect.com
1126343:dowslakemicro.com
1126344:dragoman.com
1126345:drisner-trockenbau.de
1126346:e.fivebelow.com
1126347:fav7bhn0.atlassian.net
1126348:fca-worldwide.com
1126349:forgela.com
1126350:freshworks.com
1126351:gmx.net
1126352:gotowebinar.com
1126353:halfsow.shop
1126354:hell.sighnun.shop
1126355:hicglobalsolution.com
1126356:hotsighning.com
1126357:icloud.com
1126358:infrac2.ddns.net
1126359:infraccion18.ddns.net
1126360:itariannotifications.com
1126361:jouw-pensioen.nl
1126362:jssgallery.org
1126363:judecollins.com
1126364:just-in-time-racing.com
1126365:katiestevens.net
1126366:keelhauler.org
1126367:km.maarhoudcontact.com
1126368:leadpartners24.nl
1126369:luelstudio.com
1126370:magicduino.com
1126371:mail.app.com
1126372:maxxtrend.nl
1126373:medianews24.nl
1126374:news.sedo.com
1126375:pro-versender.com
1126376:profiprodukte.net
1126377:profiverkauf.com
1126378:qualiview.nl
1126379:realspouse.com
1126380:riotops.com
1126381:routezilla.com
1126382:seniorenvoordeelpas.nl
1126383:snapmood.shop
1126384:spielendraussen.de
1126385:stackoftuts.com
1126386:starmodernfurniture.com
1126387:successoverpass.com
1126388:successwithkenny.info
1126389:sunnysideas.com
1126390:tahiti.com
1126391:take.sighnun.shop
1126392:thecircuitdetective.com
1126393:thedailyracquet.com
1126394:us.pycon.org
1126395:versender50.com
1126396:versenderbuero.com
1126397:wcr-datacontrol.info
1126398:werkzeugeonline.net
1126399:wxs.nl
1126400:adamslaboratory.com
1126401:afiph.org
1126402:archeinconsultants.com
1126403:armada.mil.ec
1126404:beach-north.com
1126405:becomeuagain.com
1126406:boomcomunicazione.com
1126407:c14.tez.host
1126408:campusleeuwarden.com
1126409:cas.menshealthyagain.com
1126410:centreforglobaleducation.com
1126411:chmcok.com
1126412:chs-deutschland.de
1126413:cluster.com
1126414:comeonconnect.com
1126415:coupon1euro.com
1126416:dd12postapoc.com
1126417:designpartnersindonesia.com
1126418:devip2.noc401.com
1126419:dogcareco.com
1126420:easthartford.org
1126421:fivebelow.com
1126422:generatorenprofis.net
1126423:generatorexperten.com
1126424:giapeaservices.com
1126425:goproswimtri.com
1126426:goraifilms.com
1126427:gowologlobal.com
1126428:grandestar.net
1126429:han-solo.net
1126430:hemafoundation.org
1126431:hirallabs.com
1126432:inmotionhosting.com
1126433:joinaff.com
1126434:joypluscondoms.com
1126435:kawaramachi-ai.com
1126436:kvk.nl
1126437:loopevolutionrecords.com
1126438:lottiecooper.lc
1126439:mail2you.club
1126440:markenhandelonline.com
1126441:meltingpotaz.com
1126442:metrostroy.com
1126443:mgdgirlsguild.org
1126444:mobile-stromerzeuger.com
1126445:moghadamzaferan.com
1126446:muenster.de
1126447:mumrests.com
1126448:murakamitatami.com
1126449:nextnewcustomer.com
1126450:nickblattfilms.com
1126451:opensea.io
1126452:ordenlaw.com
1126453:ovathemes.com
1126454:ovh.net
1126455:premium232.web-hosting.com
1126456:premium81.web-hosting.com
1126457:rackharbor.com
1126458:rent355.com
1126459:repois2020.com
1126460:rs.wewehost.com
1126461:ruleengineering.com
1126462:ryanmorel.com
1126463:s15.avl4.acemsrvd.com
1126464:s3.csa1.acemsd2.com
1126465:s6.csa1.acemsd3.com
1126466:se1.ezhostingserver.com
1126467:seobrand.net
1126468:server.vromsystems.com
1126469:sewingshoppe318.com
1126470:sgg-egypt.com
1126471:sharonlouisephotography.com
1126472:splus-s.com
1126473:sspatra.com
1126474:stage-app.nl
1126475:statecensus.info
1126476:stromgeneratoren-handel.com
1126477:talaskurutma.com
1126478:testsendblaster.com
1126479:tin.it
1126480:tonepit.rest
1126481:trinec.org
1126482:uitgekookt.nl
1126483:unsub.spmta.com
1126484:uttoron.com
1126485:vidtour.shop
1126486:warenoutlet.net
1126487:werkzeughandeldirekt.net
1126488:xml-io.proteusthemes.com
1126489:xmsnet.nl
1126490:z-kompass.com
1126491:zakelijk-diensten.nl
151 Lines affected.
Ok. thank you for finding this issue. I use $in = preg_replace('/[^a-zA-Z0-9.-]/s','',$in); in php to clean up binary codes. I don't know c-language. maybe its an idea to put it in this script gravity-parseList.c.
(my domain list, which i import, contains only a-z 0-9 and dot and hyphen)
Actually your file contains many NULL characters between lines 1126342 and 1126492:
Fixed with the linked PR.
Versions
Platform
Linux raspberrypi 6.1.21-v8+
Expected behavior
non-domains entries not visible
Actual behavior / bug
Parsed 3262783 exact domains and 0 ABP-style domains (ignored 151 non-domain entries) Sample of non-domain entries:
Steps to reproduce
Steps to reproduce the behavior:
Debug Token
Screenshots
Additional context
It would be great if i would see minimum 5 non-domains entries.