veg / hyphy-analyses

HyPhy standalone analyses
MIT License
39 stars 17 forks source link

--remove-duplicates #50

Open sjellerstrand opened 8 months ago

sjellerstrand commented 8 months ago

Hello! Im trying to remove duplicate sequences using https://github.com/veg/hyphy-analyses/tree/master/remove-duplicates.

It all works well when I run it on the example files, or only on one of my alignments. However, I have some issues when I am trying to include a tree for my alignment to trim. I then get the following error message:

Error:
'/crex/proj/snic2020-2-25/bin/hyphy-analyses/remove-duplicates/example1.nwk' could not be opened for reading by fscanf. Path stack:
        /proj/snic2020-2-25/nobackup/simon/conda/envs/hyphy/share/hyphy/
        /crex/proj/snic2020-2-25/bin/hyphy-analyses/remove-duplicates/ in call to fscanf(filter.tree,"Raw",filter.tree_string);

Function call stack
1 :  fscanf(filter.tree,"Raw",filter.tree_string);

        Keyword arguments:
                {
                 "output":"./uniq_seq"
                }
-------

Check errors.log for execution error details.

The program does seem to make some progress if I rename my tree-file to "example.nwk". But I still get the following error-message, with the sequence names expected as the ones in the example-files:

Error:
Node 'seq_991' not found in the tree or is the root node in _List _TreeTopology::RemoveANode(HBLObjectRef)

Function call stack
1 :  T-utility.Keys(filter.delete_leaves);

        Keyword arguments:
                {
                 "output":"./uniq_seq"
                }
-------

Check errors.log for execution error details.

Since I am mainly working with population data I have many conspecific individuals in my alignments. Therefore, duplicates occur often, and I suspect it would speed up my analysis significantly to remove those since I want to loop this over all genes in the genome.

Thank you!

Simon

spond commented 8 months ago

Dear @sjellerstrand,

Can you include the command you use to call hyphy with? One suggestion is to use absolute paths and see if that helps.

Best, Sergei