sjroth / ARTDeco

MIT License
15 stars 7 forks source link

ARTDeco breaks Inferring BAM file formats #9

Closed paulocaldas closed 2 years ago

paulocaldas commented 2 years ago

More than often, I run into the following error:

image

From what I understand, something is wrong with one of the bam files, seems that some information is missing. From what I can see nothing is wrong with my bam files. Even if one the alignments went wrong, I cannot really tell by looking at files (can I?). Also, ARTDeco doesn't tell us which bam files generate the issue, that would be helpful to troubleshooting. The only solution I found so far, was to run the alignments again, to overwrite possible "corrupted" files, but now I'm working with folders of >100 files, so running the alignments again and again is not an option.

here's the command I'm using: ARTDeco -home-dir /.../bam_files_dir -bam-files-dir /.../artdeco_output/ -gtf-file human.modified.annotation.file.gencodev37.gtf -cpu 8 -chrom-sizes-file human.genome.chrom.sizes -min-dog-len 2000 -dog_window 200 -min_dog_coverage 0.2

sjroth commented 2 years ago

I'm not sure that it's a corrupted BAM file. Did you verify that your modified GTF file is formatted properly?

paulocaldas commented 2 years ago

yes. I've been using the same gtf file for a while now.

sjroth commented 2 years ago

Do you know the file formats already? Is there a reason you have to infer them?

paulocaldas commented 2 years ago

I know that they all of them are paired-end reads ... and it works fine most of the time I just don't understand why it crashes in some cases, can't find the variable that it creates this issue. My initial guess was based on the fact that sometimes I could be running alignments (I use STAR btw) and the server crashes (for example), and that could create some weird stuff inside the bam file that is being generated at that particular time. And I don't have a good to check this unless I run the alignment again to generate the bam (hope this makes some sense)

sjroth commented 2 years ago

I would suggest that you specify the format of the BAM files and then try again. I can't troubleshoot if you are having alignment issues as this isn't something that I can evaluate. I can tell you that ARTDeco has successfully been run on entire TCGA cohorts so I doubt it is a file size issue.

paulocaldas commented 2 years ago

by trial and error, I can see now that this error is created by alignment issues In my case, STAR fails to completed the alignment due to disk space (for example) and I end up with some information missing in the bam file (as far as I understand). So the problem is not related with ARTDeco directly, but with the upstream analysis. I guess you can close this issue now, but would be great if ARTDeco could tell us which bam file is creating the problem when we get the message above - that way we could just get rid of the "uncompleted" bam file. (something like --> split function fails when inferring bam file, list index out of range, but which file(s)?)

sjroth commented 2 years ago

I'm glad that you were able to solve this issue. I might consider implementing your suggestion if it is a recurring issue.