hahnlab / CAFE5

Version 5 of the CAFE phylogenetics software
Other
113 stars 22 forks source link

vector::_M_range_check: __n (which is 759) >= this->size() (which is 759) Segmentation fault (core dumped) #65

Closed ehrenbentz closed 1 year ago

ehrenbentz commented 3 years ago

Hi, I am running cafe5 on a dataset containing protein families assigned by pfamscan.pl. The unassigned proteins were then clustered denovo using mcl as described in your tutorial.

Running cafe5 on the "filtered_cafe_input.txt" works just fine - ran and obtained lambda to be used with -l to run "large_filtered_cafe_input.txt".

When running cafe5 on the large protein families, I get an error (error message below). One of my protein families has a very large number of proteins (over 1000 members in 3 of my species). I'm not entirely sure what this error message is pointing to, but it looks like this will not run if any protein family has more than 759 members? Removing the large family allows cafe5 to complete (but I want to obtain the result for the large family as well).

I don't know where this threshold comes from, or if it can/should be changed, or if I need to break these protein families into smaller pieces? Thoughts? Thanks!


Running with a single core:

Command line: cafe5 -i large_filtered_cafe_input.txt -t Ultrametric.tree -c 1 -l 0.0077540923377588 -o large_results

Filtering families not present at the root from: 18 to 17

No root family size distribution specified, using uniform distribution Inferring processes for Base model Score (-lnL): inf Maximum possible lambda for this topology: 0.03009023399189 Computing pvalues... terminate called after throwing an instance of 'std::out_of_range' what(): vector::_M_range_check: __n (which is 759) >= this->size() (which is 759) Aborted (core dumped)


Also possibly helpful if others come across this issue (or similar issues): Running with all (64) cores returns a seg fault without the useful bit of information. I didn't get the whole error message until I ran with only one core:

Command line: cafe5 -i large_filtered_cafe_input.txt -t Ultrametric.tree -c 64 -l 0.0077540923377588 -o large_results

Filtering families not present at the root from: 18 to 17

No root family size distribution specified, using uniform distribution Inferring processes for Base model Score (-lnL): inf Maximum possible lambda for this topology: 0.03009023399189 Computing pvalues... Segmentation fault (core dumped)

NayeliGutierrez commented 3 years ago

Hi @ehrenbentz,

I am having the exact same issue that you described. Did you find any solution?

Please share, I already double checked my input files, CAFE installation and I am using an HPC so computer power should not be the problem. Don't know where else to look.

Best

ehrenbentz commented 3 years ago

Cafe5 will not run if any of your gene families contain more than 759 members. For large superfamilies like ORs or GPCRs, you need to break them down into finer clusters.

This was the problem for me, and it runs perfectly once I did that. This may not be your problem, but check you inputs to see if any of your family sizes are out of range.

Hope that helps!

NayeliGutierrez commented 3 years ago

Thank you, @ehrenbentz! There is a place in the bioinformatics heaven for people like you who respond even when they have solved their problem. I was able to run CAFE successfully after dealing with family sizes that were out of range.

ehrenbentz commented 3 years ago

Awesome. Glad that helped!

benfulton commented 2 years ago

Hi @ehrenbentz Any chance you could send me the input files you were using? I can't reproduce the issue.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 90 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale.