Open bratao opened 7 years ago
Hello @bratao,
I´m writing a Python wrapper for your excellent library, that I plan to release soon.
Thank you for your work! I'm really excited to hear that :)
I'm on vacation at Montreal now, I'll respond you after the 12th.
A bientôt!
Hello @bratao,
Now I got to InternalDataSequence::accumulateFeatureData(). It is responsible for 70% of memory usage during training. Do you have an idea how is possible to optimize it?
I thought the part that uses the memory the most would be this ( https://github.com/hiroshi-manabe/CRFSegmenter/blob/4c274a871a4e90b727b2cd166f4367ce71e0b519/HighOrderCRF/HighOrderCRFProcessor.cpp#L173 ), which can be hundreds of GB in my application (CJK segmentation / POS tagging), so I wasn't very serious about optimizing accumulateFeatureData().
Can you give me the data you used for testing?
Hello Again @hiroshi-manabe ,
I´m writing a Python wrapper for your excellent library, that I plan to release soon.
However, porting some internal projects to this library I can see that the memory usage exploded compared to CRFSuite.
I just started to analyze if I can improve the memory usage. I plan to use some compact data structures to store the data, such as https://github.com/Tessil/hat-trie , and I already got some good improvements.
Now I got to InternalDataSequence::accumulateFeatureData(). It is responsible for 70% of memory usage during training. Do you have an idea how is possible to optimize it?
Thank you