prewk / xml-string-streamer

Stream large XML files with low memory consumption.
MIT License
356 stars 49 forks source link

Multitasking #55

Closed KaduMoura closed 7 years ago

KaduMoura commented 7 years ago

Hi,

I wanna know if it is possible to split the file in several parts and run them in parallel (queue). If it's not, It would be a nice feature.

Thanks!

prewk commented 7 years ago

Hi!

What would be the purpose? I don't see any performance gain.

KaduMoura commented 7 years ago

I have a large xml with 18k of records, each one of them I have to import to the database and download all it's images (which takes most of the time).

That's taking too long, if I can run more than one at a time, it will be much faster.

prewk commented 7 years ago

I see, you mentioned queuing which wouldn't be faster.

PHP is by nature single-threaded (you can ofc spawn threads but with caveats), and the problem is that your solution makes the assumption that it's easy to split the file at certain points.

To be able to split it - you have to parse it, and thus you're back to to "slowness". Thats why I'd say it's a bad fit for this library.

With that said, the best thing you could do is probably to not download the images while parsing. Save the urls while parsing and download them afterwards in some other fashion (perhaps multi-threaded?)

If you insist on splitting, do that with some fast command line tool if possible, and then start one php script per file or something. Might work.

Good luck!

KaduMoura commented 7 years ago

OK, thanks for the help!