openvax / gtfparse

Parsing tools for GTF (gene transfer format) files
Apache License 2.0
109 stars 25 forks source link

BackTrace for exception when parsing gtf file #38

Open borsheimFXBG opened 1 year ago

borsheimFXBG commented 1 year ago

I am trying to use your GTF reader to truncate a very large GTF file for testing purposes. However, when I attempt to create a "read" object, I get the following error. I will be happy to send you the data file, but be aware that it is 391 MB. Please let me know. I downloaded it from Embl. My code is below the TraceBack. Ignore the docstring. That is for the finished product.

My email is davidsborsheim@lewisu.edu

TRACEBACK: /media/akiva/Data/FinalProj3/venv/bin/python /media/akiva/Data/FinalProj3/ExternalCode/DataClean/PullLines.py Traceback (most recent call last): File "/media/akiva/Data/FinalProj3/ExternalCode/DataClean/PullLines.py", line 65, in parsegtf(inpath, outpath) File "/media/akiva/Data/FinalProj3/ExternalCode/DataClean/PullLines.py", line 48, in parsegtf gtf_in = read_gtf(inpath) File "/media/akiva/Data/FinalProj3/venv/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 254, in read_gtf result_df = parse_gtf_and_expand_attributes( File "/media/akiva/Data/FinalProj3/venv/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 189, in parse_gtf_and_expand_attributes df = parse_gtf( File "/media/akiva/Data/FinalProj3/venv/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 155, in parse_gtf df_lazy = parse_with_polars_lazy( File "/media/akiva/Data/FinalProj3/venv/lib/python3.10/site-packages/gtfparse/read_gtf.py", line 118, in parse_with_polars_lazy df = polars.scan_csv( TypeError: scan_csv() got an unexpected keyword argument 'sep'

My Code: def parsegtf(infiile, outfile): '''docstring Function to read a gtf format file and write a test file with either the whole file or the first 50,000 lines whichever is shorter. docstring'''

# Lifted from https://pypi.org/project/gtfparse/
# returns GTF with essential columns such as "feature", "seqname", "start", "end"
# alongside the names of any optional keys which appeared in the attribute column
gtf_in = read_gtf(inpath)

breakpoint()
print(gtf_in)

return

endDef

iskandr commented 9 months ago

Fixed by https://github.com/openvax/gtfparse/pull/42 -- sorry I left this hanging for so long.