openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
374 stars 65 forks source link

transcript_support_level attr would cause error when containing space. #297

Open y9c opened 9 months ago

y9c commented 9 months ago

As mentioned in gtfparse code, latest Ensembl release add "(assigned to previous version ?)" in the transcript_support_level tag. Although gtfparse tried to fix this by splitting all attributes, it is not a good way to fix this. Some attributes might require space inside.

https://github.com/openvax/gtfparse/blame/7d25135fed6a1a7c60218cedc1dfac2446683183/gtfparse/attribute_parsing.py#L73-L76C60

Solution:

https://github.com/openvax/pyensembl/commit/e03a213ccedda997ecb868490e62367f779f8c9a

iskandr commented 7 months ago

Can you say more about when this fails? Do you think the right answer is to have gtfparse preserve the full string and then have PyEnsembl split it?