rgrinberg / bencode

Bencode (.torrent file format) reader/writer in OCaml
24 stars 4 forks source link

pp: implement Format.formatter-style pretty-printer #9

Closed cfcs closed 4 years ago

cfcs commented 6 years ago

I didn't touch pretty_print as I didn't want to break the API, and pp seems to be the conventional name for these printers.

The output generated looks like this:

{  ( "announce", <string:40:"udp://tracker.leechers-paradise.org:6969"> );
   ( "announce-list":
      [  <string:40:"udp://tracker.leechers-paradise.org:6969">
         <string:21:"udp://zer0day.ch:1337">
         <string:27:"udp://open.demonii.com:1337">
         <string:34:"udp://tracker.coppersurfer.tk:6969">
         <string:28:"udp://exodus.desync.com:6969">
      ] ;
   ) ;  (* ^-- -> announce-list *)
   ( "info":
      {  ( "private", 0 );
         ( "name", <string:3:"lib"> );
         ( "piece length", 2097152 );
         ( "files":
            [  {  ( "length", 11062 );
                  ( "path", [ <string:13:"omgtorrent.ml"> ]; );
                  ( "sha1", <string:20:"n\152\164\017\020nn\185\210\020\162SK\021\148c\170\2171}"> );
                  ( "attr", <string:0:""> );
               } ;
               {  ( "length", 2086090 );
                  ( "path", [ <string:4:".pad">
                              <string:7:"2086090"> ]; );
                  ( "attr", <string:1:"p"> );
               } ;
               {  ( "length", 11 );
                  ( "path", [ <string:16:"omgtorrent.mllib"> ]; );
                  ( "sha1", <string:20:"\189\212\237\229\169\224w@\144\224K\213_\213\020\t%\210%\179"> );
                  ( "attr", <string:0:""> );
               } ;
               {  ( "length", 2097141 );
                  ( "path", [ <string:4:".pad">
                              <string:7:"2097141"> ]; );
                  ( "attr", <string:1:"p"> );
               } ;
               {  ( "length", 9133 );
                  ( "path", [ <string:14:"omgtorrent.ml~"> ]; );
                  ( "sha1", <string:20:"\178\160\253p.}I\159\007p\134\000\206+\242\210bb\186F"> );
                  ( "attr", <string:0:""> );
               } ;
               {  ( "length", 2088019 );
                  ( "path", [ <string:4:".pad">
                              <string:7:"2088019"> ]; );
                  ( "attr", <string:1:"p"> );
               } ;
               {  ( "length", 1390 );
                  ( "path", [ <string:14:"omgtorrent.mli"> ]; );
                  ( "sha1", <string:20:"I\245\220\014\162^\255[\158\181M\253\\\2073\148cK\025q"> );
                  ( "attr", <string:0:""> );
               } ;
               {  ( "length", 2095762 );
                  ( "path", [ <string:4:".pad">
                              <string:7:"2095762"> ]; );
                  ( "attr", <string:1:"p"> );
               } ;
               {  ( "length", 0 );
                  ( "path":
                     [  <string:3:"foo">
                        <string:4:"bars">
                        <string:10:"empty.file">
                     ] ;
                  ) ;  (* ^-- -> info ->  -> files -> [] -> path *)
                  ( "sha1", <string:20:"\2189\163\238^kK\r2U\191\239\149`\024\144\175\216\007\t"> );
                  ( "attr", <string:0:""> );
               } ;
               {  ( "length", 2097152 );
                  ( "path", [ <string:4:".pad">
                              <string:7:"2097152"> ]; );
                  ( "attr", <string:1:"p"> );
               } ;
            ] ;
         ) ;  (* ^-- -> info ->  -> files *)
         ( "pieces", <string:len 100> );
      } ;
   ) ;  (* ^-- -> info *)
   ( "magnet-info":
      {  ( "display-name", <string:3:"lib"> );
         ( "info_hash", <string:20:"\t\161\234\234s\158\251\201\006uxv\248a\014\187\248[\186h"> );
      } ;
   ) ;  (* ^-- -> magnet-info *)
} ;
rgrinberg commented 6 years ago

The old pretty_print should probably be named something like to_string_hum. Feel free to make such a breaking change if you'd like.

I haven't looked at the printing code too closely, but it seems alright. I'll give @c-cube a couple of days to have a look and then merge unless he objects.

c-cube commented 6 years ago

This is a good idea, and I think it should be merged. I'm a bit surprised by the nest-level (which is used to comment on the closing delimiters, right?) but it's reasonable, I think. My biggest gripe is that I find space in ( foo ) to be an eyesore :stuck_out_tongue:

cfcs commented 6 years ago

@c-cube yes @ nest-level and commenting on the closing delimiters. I found that these comments made it easier to make sense of large files, where things like

               } ;
            ] ;
         ) ;
         ( "pieces", <string:len 100> );
      } ;
   ) ; 

means you otherwise have to keep track of a lot of context when reading it to know where exactly "pieces" belongs (I had accidentally written "pieces" under the "files" list in my tool to make torrents). :-)

@c-cube: I can remove the spacing, I think it makes it easier to scan through, but I don't feel strongly for that. Is it only ( foo ) (ie dictionary keys) you dislike, or do you also want to get rid of spacing after { and [ ?

c-cube commented 6 years ago

It's just personal taste, don't change it just for me ^^

cfcs commented 6 years ago

The torrent I quoted above is actually invalid since it doesn't honor this requirement from BEP-003:

Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).

Would it make sense to also add comments after keys that appear out of order? I feel like that would certainly be helpful during debugging.

When serializing we could also consider changing the type of Dict from Dict of Bencode.t list to something like:

module StringMap : Map.S with type key = string = Map.Make(String)
type serialize_t = (* in Bencode.mli *)
  | Integer of int64
  | String of string
  | List of serialize_t
  | Dict of serialize_t StringMap.t

to enforce this order?

cfcs commented 6 years ago

@rgrinberg @c-cube I pushed a proposal for the thing I suggested in the comment above. The output generated by this commit looks like this:

{  ( "announce", <string:40:"udp://tracker.leechers-paradise.org:6969"> );
   ( "announce-list":
      [  <string:40:"udp://tracker.leechers-paradise.org:6969">
         <string:21:"udp://zer0day.ch:1337">
         <string:27:"udp://open.demonii.com:1337">
         <string:34:"udp://tracker.coppersurfer.tk:6969">
         <string:28:"udp://exodus.desync.com:6969">
      ] ;
   ) ;  (* ^-- -> announce-list *)
   ( "info":
      {  ( "private", 0 );
         ( "name": (* error: out of order, BEP-003 needs ascending key order *)
            <string:3:"lib">
         ) ;  (* ^-- -> info ->  -> name *)
         ( "piece length", 2097152 );
         ( "files": (* error: out of order, BEP-003 needs ascending key order *)
            [  {  ( "length", 11150 );
                  ( "path", [ <string:13:"omgtorrent.ml"> ]; );
                  ( "sha1", <string:20:"\017\250l\176\149\018\254\229\000\224\\2\184P8|\196r\247Q"> );
                  ( "attr": (* error: out of order, BEP-003 needs ascending key order *)
                     <string:0:"">
                  ) ;  (* ^-- -> info ->  -> files -> [] -> attr *)
               } ;
               {  ( "length", 2086002 );
                  ( "path", [ <string:4:".pad">
                              <string:7:"2086002"> ]; );
                  ( "attr": (* error: out of order, BEP-003 needs ascending key order *)
                     <string:1:"p">
                  ) ;  (* ^-- -> info ->  -> files -> [] -> attr *)
               } ;
               {  ( "length", 11 );
                  ( "path", [ <string:16:"omgtorrent.mllib"> ]; );
                  ( "sha1", <string:20:"\189\212\237\229\169\224w@\144\224K\213_\213\020\t%\210%\179"> );
                  ( "attr": (* error: out of order, BEP-003 needs ascending key order *)
                     <string:0:"">
                  ) ;  (* ^-- -> info ->  -> files -> [] -> attr *)
               } ;
               {  ( "length", 2097141 );
                  ( "path", [ <string:4:".pad">
                              <string:7:"2097141"> ]; );
                  ( "attr": (* error: out of order, BEP-003 needs ascending key order *)
                     <string:1:"p">
                  ) ;  (* ^-- -> info ->  -> files -> [] -> attr *)
               } ;
               {  ( "length", 9133 );
                  ( "path", [ <string:14:"omgtorrent.ml~"> ]; );
                  ( "sha1", <string:20:"\178\160\253p.}I\159\007p\134\000\206+\242\210bb\186F"> );
                  ( "attr": (* error: out of order, BEP-003 needs ascending key order *)
                     <string:0:"">
                  ) ;  (* ^-- -> info ->  -> files -> [] -> attr *)
               } ;
               {  ( "length", 2088019 );
                  ( "path", [ <string:4:".pad">
                              <string:7:"2088019"> ]; );
                  ( "attr": (* error: out of order, BEP-003 needs ascending key order *)
                     <string:1:"p">
                  ) ;  (* ^-- -> info ->  -> files -> [] -> attr *)
               } ;
               {  ( "length", 1390 );
                  ( "path", [ <string:14:"omgtorrent.mli"> ]; );
                  ( "sha1", <string:20:"I\245\220\014\162^\255[\158\181M\253\\\2073\148cK\025q"> );
                  ( "attr": (* error: out of order, BEP-003 needs ascending key order *)
                     <string:0:"">
                  ) ;  (* ^-- -> info ->  -> files -> [] -> attr *)
               } ;
               {  ( "length", 2095762 );
                  ( "path", [ <string:4:".pad">
                              <string:7:"2095762"> ]; );
                  ( "attr": (* error: out of order, BEP-003 needs ascending key order *)
                     <string:1:"p">
                  ) ;  (* ^-- -> info ->  -> files -> [] -> attr *)
               } ;
            ] ;
         ) ;  (* ^-- -> info ->  -> files *)
         ( "pieces", <string:len 80> );
      } ;
   ) ;  (* ^-- -> info *)
} ;
c-cube commented 4 years ago

merged manually, thanks! :slightly_smiling_face: