socketry / async-http

MIT License
320 stars 46 forks source link

Possible bug: Downloaded file differs with that of Firefox #121

Closed marek22k closed 1 year ago

marek22k commented 1 year ago

Hello,

I downloaded the same file with Firefox and Async, but the one from Async is 0.1MB bigger and invalid (is a bz2 file). What is the reason for this? Files: files.zip Code:

require 'logger'
require 'async'
require 'async/barrier'
require 'async/http/internet'
require_relative 'helper/config'

config = Configuration::Configuration.new.validate!
logger = config.create_logger

mrts = [config.table4_file, config.table6_file]

Async do
  internet = Async::HTTP::Internet.new
  barrier = Async::Barrier.new

  mrts.each do |mrt|
    barrier.async do
      begin
        logger.info "Download #{mrt}."
        response = internet.get("#{config.collector}/#{mrt}")

        case response.status
        when 200
          logger.info "Download of #{mrt} complete."
          response.save(mrt)
          logger.info "#{mrt} successful saved."
        else
          raise "Failed to download #{mrt}."
        end
      rescue RuntimeError => e
        logger.error(e.message)
      end
    end
  end
ensure
  internet.close
end
ioquatix commented 1 year ago

Can you give me the two URLs?

marek22k commented 1 year ago

https://mrt.collector.dn42/master4_latest.mrt.bz2 But I think without access to the dn42/NeoNetwork/CRXN/Freifunk/ChaosVPN it might be difficult to download anything there. Screenshot at 2023-06-02 15-32-45

ioquatix commented 1 year ago

Is there a host I can SSH into for testing?

ioquatix commented 1 year ago
samuel@sakura ~/D/files> bunzip2 async.mrt.bz2 

bunzip2: async.mrt.bz2: trailing garbage after EOF ignored
samuel@sakura ~/D/files> bunzip2 firefox.mrt.bz2 
samuel@sakura ~/D/files> ls -lah
total 255472
drwx------@  4 samuel  staff   128B  2 Jun 23:02 ./
drwxr-xr-x@ 36 samuel  staff   1.1K  2 Jun 22:59 ../
-rw-------   1 samuel  staff    62M  2 Jun 13:19 async.mrt
-rw-r--r--   1 samuel  staff    62M  2 Jun 13:19 firefox.mrt
samuel@sakura ~/D/files> shasum async.mrt 
0cb78bff3b4239cb8745407c4bc16f596540f654  async.mrt
samuel@sakura ~/D/files> shasum firefox.mrt 
0cb78bff3b4239cb8745407c4bc16f596540f654  firefox.mrt

The statement bunzip2: async.mrt.bz2: trailing garbage after EOF ignored is worrying.. let me check how that could be happening.

marek22k commented 1 year ago

If you want me to run a modified version of the code, feel free to tell me.

As far as I know there is no SSH host or similar - the only thing I could offer is to provide you with a WireGuard tunnel (PM https://mk16.de/about_me/)

ioquatix commented 1 year ago

I am going to try and repro locally.

ioquatix commented 1 year ago

Firstly, can you run it with CONSOLE_LEVEL=debug ... and give me the output.

marek22k commented 1 year ago

The statement bunzip2: async.mrt.bz2: trailing garbage after EOF ignored is worrying.. let me check how that could be happening.

Mhh, that might explain why another implementation, can read the file anyway. To extract the MRT file I use bzip2-ffi - maybe bzip2-ffi can't handle it.

marek22k commented 1 year ago

Firstly, can you run it with CONSOLE_LEVEL=debug ... and give me the output.

I, [2023-06-02T16:09:50.703326 #9879]  INFO -- : Download master4_latest.mrt.bz2.
  0.0s    debug: Async::Pool::Controller [oid=0x30c] [ec=0x320] [pid=9879] [2023-06-02 16:09:50 +0200]
               | No available resources, allocating new one...
I, [2023-06-02T16:09:50.703650 #9879]  INFO -- : Download master6_latest.mrt.bz2.
  0.0s    debug: Async::HTTP::Client [oid=0x334] [ec=0x320] [pid=9879] [2023-06-02 16:09:50 +0200]
               | Making connection to #<Async::HTTP::Endpoint https://mrt.collector.dn42/master4_latest.mrt.bz2 {}>
 0.06s    debug: Async::IO::Socket [ec=0x320] [pid=9879] [2023-06-02 16:09:50 +0200]
               | Connecting to #<Addrinfo: 172.20.129.7:443 TCP (mrt.collector.dn42)>
 0.14s    debug: Async::HTTP::Protocol::HTTPS [ec=0x320] [pid=9879] [2023-06-02 16:09:50 +0200]
               | Negotiating protocol "h2"...
 0.18s    debug: Async::Pool::Controller [oid=0x30c] [ec=0x35c] [pid=9879] [2023-06-02 16:09:50 +0200]
               | Wait for resource -> #<Async::HTTP::Protocol::HTTP2::Client 0 active streams>
 0.18s    debug: Async::Pool::Controller [oid=0x30c] [ec=0x320] [pid=9879] [2023-06-02 16:09:50 +0200]
               | Wait for resource -> #<Async::HTTP::Protocol::HTTP2::Client 1 active streams>
I, [2023-06-02T16:09:50.975347 #9879]  INFO -- : Download of master6_latest.mrt.bz2 complete.
I, [2023-06-02T16:09:51.031937 #9879]  INFO -- : Download of master4_latest.mrt.bz2 complete.
 1.55s    debug: Async::Pool::Controller [oid=0x30c] [ec=0x320] [pid=9879] [2023-06-02 16:09:52 +0200]
               | Reuse #<Async::HTTP::Protocol::HTTP2::Client 1 active streams>
I, [2023-06-02T16:09:52.253842 #9879]  INFO -- : master4_latest.mrt.bz2 successful saved.
 2.48s    debug: Async::Pool::Controller [oid=0x30c] [ec=0x35c] [pid=9879] [2023-06-02 16:09:53 +0200]
               | Reuse #<Async::HTTP::Protocol::HTTP2::Client 0 active streams>
I, [2023-06-02T16:09:53.187483 #9879]  INFO -- : master6_latest.mrt.bz2 successful saved.
 2.48s    debug: Async::Reactor [oid=0x370] [ec=0x384] [pid=9879] [2023-06-02 16:09:53 +0200]
               | Exiting run-loop because finished.
marek22k commented 1 year ago

It also works with the native Ruby HTTP client. (though I would prefer to use async as I would then have it consistent). https://codeberg.org/mark22k/BgpLookup/src/commit/02bddd5e8ffe5920e9573a434ef8d03791b3a827/get_mrts.rb

ioquatix commented 1 year ago

Native Ruby client only supports HTTP/1 which is probably the big difference.

ioquatix commented 1 year ago
          logger.info "Download of #{mrt} complete."

          Console.logger.info(response, body: response.body.inspect, length: response.body.length, headers: response.headers.to_h)

Can you add that above log line and let me know the output.

ioquatix commented 1 year ago

I might have figured it out...

ioquatix commented 1 year ago

Can you update protocol-http and check if the issue still exists?

marek22k commented 1 year ago

As far as I can tell, everything is working fine again now. Thanks a lot!

ioquatix commented 1 year ago

Thanks for your bug report.