hadiasghari / pyasn

Python IP address to Autonomous System Number lookup module. (Supports fast local lookups, and historical lookups using archived BGP dumps.)
Other
292 stars 72 forks source link

Download from routeviews via HTTP additional to FTP #64

Open ghost opened 4 years ago

ghost commented 4 years ago

Currently, the download from routeviews.org happens via FTP in the pyasn_util_download.py. However, in order to use HTTP proxies (mandatory in some requirements), HTTP is preferred or only possible.

hadiasghari commented 4 years ago

@wagner-certat if you have any PRs on this I'd be happy to merge :)

ghost commented 4 years ago

In IntelMQ we now integrated the pyasn database update functionality, so there's no overlap of interest / potential of synergies. I therefore doubt that I'll provide a PR for this in the near future :/

hadiasghari commented 4 years ago

No problem. I'll close this for now, as I think some might also simply use wget. If there is interest in the future we can revisit this issue. Thanks!

mansweet commented 3 years ago

I think I have an interest in this feature since where I'm trying to do the download from is firewalled off from ftp. I'll poke around and try to submit a PR @hadiasghari

mansweet commented 3 years ago

@hadiasghari I think i've found a solution to do this with.

Do you have any other requests as to how this feature is implemented?

Lastly, since I'm currently working on this issue, would you kindly mind re-opening the issue? If it's not fixed within a month, I'd say go ahead and close it.

For record keeping, I'm working on this in my forked repo in a feature branch https://github.com/mansweet/pyasn/tree/add-http-download-method

mansweet commented 3 years ago

@hadiasghari I've got my changes ready over at: https://github.com/hadiasghari/pyasn/compare/master...mansweet:add-http-download-method?expand=1

I need to do some git-cleanup since it seems that there are changes from a previous PR I submitted to your repo looped into this one. I suppose those might be automagically resolved if you approve and merge PR #69 . Let me know what you want to do with my previous PR, then I'll (possibly clean up my branch depending on the action) and submit a PR and we can discuss it publicly there

Thank you for your time, consideration and maintenance of this project!

hadiasghari commented 3 years ago

Hi @mansweet, thank you for the PR, I'll need a few days to get to this.

hadiasghari commented 3 years ago

@mansweet thank you for the commits. I approved and merged PR #69. Please send a new PR so I can run tests/check and then merge/approve.

Note, regarding the feature list you mention, I agree with all, except this I'll keep it simple and just add functionality to download the latest rib file, if that's alright. I feel this would make it unnecessary complicated that the FTP option can do different dates but the HTTP option only the latest. Would it be too difficult to allow different dates? (Since the date parsing option logic is already implemented).

Additionally, will it also support https downloads?

Thanks :)

mansweet commented 3 years ago

@hadiasghari thanks for merging #69 and re-opening this issue.

I agree with your request and think that there should be uniform functionality for different options. However, correct me if I'm wrong, but it seems that the FTP path is only capable of downloading the latest, while the http (as of now) can download specific dates (with the --dates-from-file CLI arg). I think my feature here is just adding functionality to download the latest from the http source without requiring the user to submit a file of dates in order to use the http path.

It might make sense then in a separate scope to to simplify the interface overall such that a user can:

Now, as for the https downloads you've requested, I can look into that. Can you provide me with what the source is that I should try to fetch the ribs from? https://archive.routeviews.org does not resolve.

Additionally, another incongruence I have is that the http method only allows for IPv4 downloads (while FTP can do any of them). Would you be able to kindly point me to the IPv6 source I should fetch via http?

The PR for the feature as is is found at https://github.com/hadiasghari/pyasn/pull/72

Also, just an interesting find. Today is the 31st of August. It looks like routeviews makes the directory for the next month perhaps a little bit early without populating it. Just point this out in case you find some strange corner cases in the future! Screen Shot 2021-08-31 at 3 24 21 PM

mansweet commented 3 years ago

hi @hadiasghari , any update on merging this PR?

ghost commented 3 years ago

Also, just an interesting find. Today is the 31st of August. It looks like routeviews makes the directory for the next month perhaps a little bit early without populating it. Just point this out in case you find some strange corner cases in the future!

That hit us in certtools/intelmq#2088 as well

hadiasghari commented 3 years ago

Hi @mansweet, thanks for the PR. I posted an update to the PR conversation, please see that.

@mansweet @wagner-certat regarding the last day of month edge case (which is probably a change on Routeviews side), we could go back one day if nothing is found in the current day.

ghost commented 3 years ago

The monthly dirs are sorted here:

    months = sorted(ftp.nlst(archive_root), reverse=True)  # e.g. 'route-views6/bgpdata/2016.12'

We could also always use the current month, instead of just the newest directory.