vstinner / hachoir

Hachoir is a Python library to view and edit a binary stream field by field
http://hachoir.readthedocs.io/
GNU General Public License v2.0
613 stars 70 forks source link

[help] example on how to parse an iso file #81

Open eadmaster opened 1 year ago

eadmaster commented 1 year ago

followup: https://stackoverflow.com/questions/45107320/parsing-an-iso-file-with-hachoir

is it possible to iterate over these hachoir.parser.file_system.iso9660.Volume items and list the filenames inside the iso file?

vstinner commented 1 year ago

If I download dsl-4.11.rc1.iso from http://distro.ibiblio.org/damnsmall/release_candidate/, hachoir-urwid shows me a ISO 9660 file system:

0) file:/home/vstinner/dsl-4.11.rc1.iso: ISO 9660 file system (50.6 MB)
   0) padding[0]= <null> (32.0 KB)
 - 32768) volume[0] (2048 bytes)
      0) type= Primary Volume Descriptor: Volume descriptor type (1 bytes)
      1) signature= "CD001": ISO 9960 signature (CD001) (5 bytes)
      6) version= 1: Volume descriptor version (1 bytes)
    - 7) content (2041 bytes)
         0) unused[0]= <null> (1 bytes)
         1) system_id= "LINUX": System identifier (32 bytes)
         33) volume_id= "KNOPPIX": Volume identifier (32 bytes)
         65) unused[1]= <null> (8 bytes)
         73) space_size= 5504806119530325324: Volume space size (8 bytes)
         81) unused[2]= <null> (32 bytes)
         113) set_size= 16777217: Volume set size (4 bytes)
         117) seq_num= 16777217: Sequence number (4 bytes)
         121) block_size= 526336: Block size (4 bytes)
         125) path_table_size= 3891110078048108598: Path table size (8 bytes)
         133) occu_lpath= 352321536: Location of Occurrence of Type L Path Table (4 bytes)
         137) opt_lpath= 0: Location of Optional of Type L Path Table (4 bytes)
         141) occu_mpath= 23: Location of Occurrence of Type M Path Table (4 bytes)
         145) opt_mpath= 0: Location of Optional of Type M Path Table (4 bytes)
         149) root= "\"\0\x1d\0\0\0\0\0\0\x1d\0\b\0\0(...)": Directory Record for Root Directory (34 bytes)
         183) vol_set_id= (empty): Volume set identifier (128 bytes)
         311) publisher= (empty): Publisher identifier (128 bytes)
         439) data_preparer= (empty): Data preparer identifier (128 bytes)
         567) application= "MKISOFS ISO 9660/HFS FILESYSTEM BUILDER (...)": Application identifier (128 bytes)
         695) copyright= (empty): Copyright file identifier (37 bytes)
         732) abstract= (empty): Abstract file identifier (37 bytes)
         769) biographic= (empty): Biographic file identifier (37 bytes)
         806) creation_ts= "2012080318122900ð": Creation date and time (17 bytes)
         823) modification_ts= "2012080318122900ð": Modification date and time (17 bytes)
         840) expiration_ts= "0000000000000000\0": Expiration date and time (17 bytes)
         857) effective_ts= "2012080318122900ð": Effective date and time (17 bytes)
         874) struct_ver= 1: Structure version (1 bytes)
         875) unused[3]= <null> (1 bytes)
         876) app_use= (empty): Application use (512 bytes)
         1388) unused[4]= <null> (653 bytes)
 - 34816) volume[1] (2048 bytes)
      0) type= Boot Record: Volume descriptor type (1 bytes)
      1) signature= "CD001": ISO 9960 signature (CD001) (5 bytes)
      6) version= 1: Volume descriptor version (1 bytes)
    - 7) content (2041 bytes)
         0) sys_id= "EL TORITO SPECIFICATION": Boot system identifier (31 bytes)
         31) boot_id= (empty): Boot identifier (31 bytes)
         62) system_use= "\0\0&\0\0\0\0\0\0\0\0\0\0\0(...)": Boot system use (1979 bytes)
 - 36864) volume[2] (2048 bytes)
      0) type= Supplementary Volume Descriptor: Volume descriptor type (1 bytes)
      1) signature= "CD001": ISO 9960 signature (CD001) (5 bytes)
      6) version= 1: Volume descriptor version (1 bytes)
      7) raw_content= "\0\0L\0I\0N\0U\0X\0 \0(...)": Raw data (2041 bytes)
 - 38912) volume[3] (2048 bytes)
      0) type= Volume Descriptor Set Terminator: Volume descriptor type (1 bytes)
      1) signature= "CD001": ISO 9960 signature (CD001) (5 bytes)
      6) version= 1: Volume descriptor version (1 bytes)
    - 7) content (2041 bytes)
         0) null= <null> (2041 bytes)
   40960) end= "MKI Fri Aug  3(...)" (50.6 MB)

I'm not sure what you are looking for. Most basic example:

from hachoir.parser import createParser
from hachoir.metadata import extractMetadata
from sys import argv, stderr, exit

filename = "/home/vstinner/dsl-4.11.rc1.iso"
parser = createParser(filename)
if not parser:
    print("Unable to parse file", file=stderr)
    exit(1)

with parser:
    for field in parser:
        print(field.name)

Output:

padding[0]
volume[0]
volume[1]
volume[2]
volume[3]
end
eadmaster commented 1 year ago

ok, how can i list (and extract) the files inside each volume?

vstinner commented 1 year ago

ok, how can i list (and extract) the files inside each volume?

It don't know where filenames are stored in an ISO file system. Maybe the Hachoir parser is incomplete.