krasileva-group / plant_rgenes

12 stars 9 forks source link

Output from K-parse_Pfam_domains_NLR-fusions-v2.2.pl missing #3

Closed awixom18 closed 8 years ago

awixom18 commented 8 years ago

When I run K-parse_Pfam_domains_NLR-fusions-v2.2.pl on the verbose output from K-parse_Pfam_domains_v3.1.pl I no longer have any errors listed, but I am getting only blank output. All of the files that are supposed to output, only have the defined header lines. It doesn't include any of the information from my db_description.tsv or any of the counts of the domains. I really have no idea what could be causing this issue.

The command being used is: perl ./processing_scripts/K-parse_Pfam_domains_NLR-fusions-v2.2.pl -i ./master_dir/ -o ./master_dir/ -d ./master_dir/Ssisymbriifolium/pfam/db_description.tsv

Is there any information I can provide to help resolve this issue?

Thank you for your help,

Alex

krasileva commented 8 years ago

Hi Alex, your usage is correct.

my $usage = "usage: perl script.pl -i|--indir directory for batch retrieval of input pfamscan.parsed.verbose files -e|--evalue evalue cutoff for determining domain fusions [default 1e-3] -o|--output output directory -d|--db_description description of datasets used in the analyses [Organism Species_ID NCBI_taxon_ID Family Database Date_aquired Restrictions Version Common_Name Source Reference] ";

Could you check if the outputs from the previous step are named correctly and could be retrieved and are they symbolic links or real files?

ie if you run 'find -L ./master_dir/ -name "pfamscan.parsed.verbose", do you get all the input files that you would like this script to process?

If you need to use 'find ./master_dir/ -name "pfamscan.parsed.verbose"' instead, please, modify line61 of the code accordingly and accept my apologies, it seems that I was using symbolic links to point to the subset of parsed files I wanted to analyze and not documenting it well enough.

Best wishes,

Ksenia

awixom18 commented 8 years ago

I have checked, and using both find procedures (with -L and without -L) appears to retrieve the correct input files. Unfortunately, it still doesn't give me any output other than the defined headers. I've converted the tsv files to txt so I could attach them here. This is the only output I am receiving:

nlrsd_by_prevalence03302016.txt nlrsd_domains-03302016.stats.txt nlrsd_summary_table03302016.txt The word cloud file is empty and therefore will not attach.

Is there any other information I can provide?

Thanks for your time and effort,

Alex

krasileva commented 8 years ago

Hi Alex,

Thank you for confirming. I am happy to make this work!

There are a few sanity checks in the script (this is the least generic script in the pipeline and really adapted to do the analyses in the paper..)

First check is to get the species name from the name of the file and cross-check it against the database file:

Line 73 my ($species)=split("_pfamscan", $basefile); Line 74 $species =~ s/.protein.fa//; Line 75 $species =~ s/.fa//;

Line 77 if ( defined ( $db{$species} ))

For you, $species should at this point contain 'Ssisymbriifolium' and this should match the database. If this is the case, the script will print on screen species name and family name.

Line 81 print $species, "\t", $family, "\n";

So the very first output while running the script should be

Ssisymbriifolium Solanaceae

Do you see this retrieved correctly by the script? (this should be on screen)

Cheers,

Ksenia

awixom18 commented 8 years ago

Hi Ksenia,

So when I run those parts of the scripts using single line debug:

$species is defined as Ssisymbriifolium after lines 73-75.

Once line 77 is run, the script jumps to line 149, and no $family is defined or anything else for that matter...

I have attached part of the debugging line by line to show what I did.

Cheers,

Alex Debuggedlines.txt

krasileva commented 8 years ago

Ok, I see that the species could not be inferred from the name of the input file.

Could you send me the exact name of the input file (from parsed pfamscan) and a line from your database file where this species is mentioned?

I could also debug here if you send me both input files: Ksenia.Krasileva@tgac.ac.uk

Cheers,

Ksenia

awixom18 commented 8 years ago

I have emailed you the files.

Thank you for all your help!!

Cheers,

Alex

krasileva commented 8 years ago

Got the files! Let's discuss by e-mail.

Cheers,

Ksenia