Closed inter1965 closed 4 years ago
This sounds like a great idea, would you mind sending me a large RELION model star file for testing?
I will probably try use stringio because I have used these before but if you have experience with mmap and would like to have a go it would be good to compare both!
The idea about an option for just reading the first few blocks is a good one and should work with the way things are set up - I’ll look into it over the next week or so.
Cheers,
Alister
On 12 Oct 2020, at 10:02, Xhark notifications@github.com wrote:
When coping with a star file contains thousands of sub data blocks(.e.g. 3D classification's model file), is it possible to use a mmap or stringio buffer object to feed pandas.read_csv's argument in _read_loop_data to increase the performance? Otherwise it might take hours to read a model star file. Furthermore, add an option to read just first several blocks would be a nice addition to speed up specific case. Cheers.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Wow, the mail system complained about the size of the original file. Hope the zipped version works. Best, X
On Mon, Oct 12, 2020 at 9:40 PM 张皛闶 inter1965@gmail.com wrote:
Hi Dear Alisterburt, please find the attached model star file (>22MB), it takes several hours to parse. Hope there's a good workaround to deal with it. Cheers, X
On Mon, Oct 12, 2020 at 4:32 PM alisterburt notifications@github.com wrote:
This sounds like a great idea, would you mind sending me a large RELION model star file for testing?
I will probably try use stringio because I have used these before but if you have experience with mmap and would like to have a go it would be good to compare both!
The idea about an option for just reading the first few blocks is a good one and should work with the way things are set up - I’ll look into it over the next week or so.
Cheers,
Alister
On 12 Oct 2020, at 10:02, Xhark notifications@github.com wrote:
When coping with a star file contains thousands of sub data blocks(.e.g. 3D classification's model file), is it possible to use a mmap or stringio buffer object to feed pandas.read_csv's argument in _read_loop_data to increase the performance? Otherwise it might take hours to read a model star file. Furthermore, add an option to read just first several blocks would be a nice addition to speed up specific case. Cheers.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alisterburt/starfile/issues/3#issuecomment-706969065, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTUOX2S2SVTAGRSBEVXMTSKK5K3ANCNFSM4SMOU6ZQ .
thanks!
I have a solution which reads files much faster now but it's failing a couple of tests so I need to debug a little more - hopefully I'll have it up very soon.
On my machine the new parser reads a 1 million line file split into 1000 blocks of 1000x10 tables in ~10s - I've also added the option for only reading the first N blocks in case you want to skip things as you suggested
On Tue, 13 Oct 2020 at 15:23, Xhark notifications@github.com wrote:
Here's one example, hope it could help. model.zip https://github.com/alisterburt/starfile/files/5371573/model.zip
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alisterburt/starfile/issues/3#issuecomment-707734841, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABXYBYCARNXARY5XXKIFWATSKRIETANCNFSM4SMOU6ZQ .
I got myself quite lost in bugs trying to finish up the optimisations so I reverted to the last working state which was already faster and pushed it to pypi (v0.2.3) - will go over it properly when I have a little more time
refactored, clean and working nicely now - hopefully this makes things faster for you. Please update to v0.3.1
When coping with a star file contains thousands of sub data blocks(.e.g. 3D classification's model file), is it possible to use a mmap or stringio buffer object to feed pandas.read_csv's argument in _read_loop_data to increase the performance? Otherwise it might take hours to read a model star file. Furthermore, add an option to read just first several blocks would be a nice addition to speed up specific case. Cheers.