iMetOsaka / UNAGI

3 stars 4 forks source link

Error at Finding Splicing Isoforms #10

Open jayradke opened 1 year ago

jayradke commented 1 year ago

Trying to use UNAGI to study gene expression in the hamster lung after viral infection. We get the following after the script moves to finding splicing isoforms. Right now we are just running on a limited data set.

2022/12/01 - 10:24:43] Sorting the new mapped reads [2022/12/01 - 10:24:43] Generating the genome coverage for each position [2022/12/01 - 10:37:59] Determining genes start and end positions from the coverage [2022/12/01 - 10:38:03] Determining genes end positions from 3' coverage [2022/12/01 - 10:38:12] Determining genes start positions from 5' coverage [2022/12/01 - 10:38:21] Intersecting genes start and end positions from the coverage analysis and cuts [2022/12/01 - 10:38:21] Combining the results in a single file [2022/12/01 - 10:38:21] Finding splicing isoforms Traceback (most recent call last): File "/home/jayradke/UNAGI/app/unagi.py", line 812, in main(sys.argv[1:]) File "/home/jayradke/UNAGI/app/unagi.py", line 179, in main filterByCoverage(os.path.join(transitionnalOutputPath,config["raw_splice_sites_file"]),os.path.join(transitionnalOutputPath,config["total_coverage_file"]),os.path.join(transitionnalOutputPath,config["coverage_filtered_splice_sites_file"])) File "/home/jayradke/UNAGI/app/unagi.py", line 554, in filterByCoverage valid = valid or (readCount != 0 and coverageMap[chr][int(spliceSite)] / readCount < int(config["max_coverage_to_splice_ratio"])) IndexError: list index out of range

imet-k commented 1 year ago

Thank you for using UNAGI. Can you share the limited data set? Regards,

jayradke commented 1 year ago

We ran the entire data set (~4 million long cDNA reads) and got the same error. These are ONT cDNA library from Syrian hamster lung tissue. Are there any suggestions on if we need to change anything in the Advanced Options?

imet-k commented 1 year ago

Can you share the protocol for library preparation? I am not sure but I guess your reads are not stranded. If the protocol includes some universal primers, they can be used for stranding. Our code was designed to deal with stranded reads and non-stranded reads may raise some issues. Anyway, I am also working on adding an option for non-stranded reads.

jayradke commented 1 year ago

Protocol for library prep was ONT cDNA-PCR Seq (SQK-PCS109).

Using a direct RNA library prep (ONT SQK-RNA002) of Adenovirus infected A549 cells we were able to get UGANI to run fine mapping to the Ad genome. I will try this data set again mapping to the human genome to see if it completes.