jgluck / VALET

MetaGenomic Validation Pipeline
6 stars 4 forks source link

ORF GFF file filter for summary #16

Open jgluck opened 10 years ago

jgluck commented 10 years ago

We would like the ability to provide a gff formatted file of ORFS and have results filtered so that only those misassemblies which overlap orfs are reported.

jgluck commented 10 years ago

@cmhill Question.

Currently I wait for summary.gff to be created. I then run through summary.gff and create orf_filtered_summary.gff which contains only those lines which overlapped orfs from a provided sorted gff formatted file of ORFS.

Do you have a different vision for this output, or do you think that this is fine? If orfs exist I have to add columns to the table file. That's all that's left now.

cmhill-zz commented 10 years ago

Make sure the reported ORF misassemblies are the intersection between the misassembly and the ORF in question. You are going to have cases where a misassembled region might span 10kb, and there might be multiple ORFs within that region.

The big thing is adding the columns to the table. You're going to have modify generate_summary_table in pipeline.py.

jgluck commented 10 years ago

Almost done. Testing it on NOT Carsonella. I'm making the assumption that the orfs and the current summary are in the same order. I believe this is incorrect. I should probably be sorting both.

jgluck commented 10 years ago

@cmhill I just pushed the changes to a new branch called "orf-filter" I didn't want to pollute the current code base. If you open pipeline.py and search for "orf" it should show up.

The lines in question are: https://github.com/jgluck/VALET/blob/orf-filter/src/py/pipeline.py#L181-L195