Closed teresi closed 1 year ago
Hi Michael,
As we discussed over text, this commit breaks the reading of OLD TE Density output data. So users would have to re-generate output data if they want to interrogate it with the new DensityData class. So perhaps we should consider changing the version number. I'll leave that determination and change up to you.
Other than that, I had to add in some code to the DensityData and MergeData files to accommodate the string change. I mimicked your str decode method in the DensityData init section and applied it to the chromosome ID, and the order and superfamily TE strings. That was needed because those things were also being converted into byte strings and needed the decoder.
Finally, I had to modify how we pass the chromosome ID string as an input arg in write_vlen_str_h5py
in merge_data.py
. The chromosome needed to be an list of strings iterable because when it was just a pure string it was getting broken up during the byte str operation. E.g 'Chr1'
was becoming b'C'
, b'h'
, b'r'
, b'1'
and then DensityData would yell at me because it was considering those multiple unique chromosomes.
I had to manually investigate data with DensityData
to figure all of this out. SO even though the tests "worked" it obscured that this update would break things. I'll look into writing more tests but may need some assistance there...
I will also begin removing scipy from the requirements.
ok, I'll take a look
since I didn't see that in our tests, we'll need that added
DESIGN
Python 3.11 just came out so I wanted to see if we could upgrade.
One issue was that we had an old h5py (2.10 vs 3.x), so I figured we should fix that before upgrading.
FUTURE
We should probably remove scipy since that was only required from an example and it gave me issues.
I had to install these on 22.04 but will need to try again after the H5PY upgrade:
The DensityData should be refactored to the MergeData (would be best to call them both DensityData) but we should probably hold off on that until we refactor it, we can talk more about that later
TESTING