internetarchive / warc

Python library for reading and writing warc files
GNU General Public License v2.0
237 stars 115 forks source link

fast seek() for multiprocessing #28

Open meshiguge opened 6 years ago

meshiguge commented 6 years ago

here I want to split warc file to small chunks and then use multiprocessing in python

for text file, we can use seeks, but how to seek in warc module or .gz warc files ?? any advices ?

kartheek7895 commented 6 years ago

You can open it as gzip file and perform seek, then from there you can pass the file pointer to WARCReader