Reformatted the Map functions that are used in the encoding step almost fully applying numpy for speed. Int8 has been applied in the integer numpy vectors to save memory and float32 in the vectors that require floating point. Results stay largely the same with some negligible change.
Removed the step where the matrices are saved to disk and then loaded again in memory with a different variable name thus causing more memory consumption.
Added .gitignore to be on the safe side when tinkering with the code.
Processing speed of the Map functions has gone down from X days (unable to tell since it never finished running) to less than 1-2 minutes in processing 2M lines file.
If there are any doubts or suggestions please don't hesitate to ask.
If there are any doubts or suggestions please don't hesitate to ask.