iTaxoTools / TaxI2

Calculation and analysis of pairwise sequence distances
GNU General Public License v3.0
0 stars 0 forks source link

Consider inclusion of ASAPy in TaxI3 #34

Open mvences opened 2 years ago

mvences commented 2 years ago

The goal of this issue is mostly that of a "feasibility check". Because many of the goals of TaxI3 (in the all-against-all mode) are related to those of ASAP (and ABGD, but we will focus on ASAP which is the more modern program), it makes sense to think about a possible integration of ASAP functions in TaxI3. For instance, both ASAP and TaxI3 calculate a "barcode gap":

A true integration between these approaches would be extremely difficult and would require much additional code (for instance, if we would try to say, we use ASAP to "test" the species information that is already present in the tabfile input file).

However, a more simple integration would be similar to the integration of pyckmeans. So, in the all-against-all mode, the user could choose between

  1. For the simple clustering, the program simply performs the clustering as it is doing now, and outputs the results along with the other output.

  2. For the pyckmeans clustering, we need to see how many user settings are necessary to properly run the program, so either this can be run directly, or requires a separate page or pop-up-menu to adjust settings.

  3. For the ASAP species delimitation, we would need to find a way how this can be done separately from the other TaxI3 calculations. Some kind of button that takes the user to a new Window corresponding to the ASAPy window, here the ASAP analysis can be performed and the results outputted and saved, and then there is a "Back" button to go back to the regular TaxI3 window. Alternatively, ASAP settings can be done using a pop up menu, and then the program first runs the regular TaxI3 analyses, and subsequently runs ASAP, and all output (TaxI3 and ASAP) is all provided at once. An important aspect is, if ASAP can be implemented this way, it will be necessary to also include some code from DNAconvert, to convert the tabfile into fasta so it can be run in ASAP. By the way, probably the same will be necessary for pyckmeans.

mvences commented 2 years ago

After thinking about this a bit more, I here propose a somewhat more useful workflow that could integrate ASAP. To be discussed!

Window I. After selecting "All against all" program mode, the user first can choose whether to align the sequences or not:

  1. My sequences are already aligned and I want to proceed without further alignment
  2. My sequences are unaligned and I want to proceed without multiple alignment (comparisons will be done with pairwise alignments, or alignment-free)
  3. I want to align them with Mafft before analysis [maybe the Mafft options can be already present on this window, and get active when the third option is selected]

-> Option 1 skips the alignment and proceeds to the next window II -> Option 2 goes directly to window IV with the main TaxI3 analysis options -> Option 3 performs the alignment and then the user proceeds to the next window II

Window II At this point the input file consists of aligned sequences. The user is given the following options:

  1. Proceed without ASAP or pyckmeans species delimitation / clustering information
  2. I want to run species delimitation information to my data based on an ASAP analysis (and possibly choose to overwrite previous species information if included)
  3. I want to add species delimitation information to my data based on an pyckmeans analysis s (and possibly choose to overwrite previous species information if included)

-> selecting option 1 proceeds to window III -> selecting option 2 or 3 will enable options for ASAP or pyckmeans (not sure, but we can maybe try having these on the same window

Window IIa After running ASAP or pyckmeans, the user should be given the option to view and save the output files. However, one of the most important aspect is that we need to have some interactivity. ASAP needs to feed back the various chosen partitions and their ASAP score, and the user then can choose whether one of these should be used to add to the data file for analysis (or to overwrite existing information). For this exchange of information, we need to check if SPART-XML might be the best data exchange format, since anyway we should enable reading and writing of this format for the various programs. Same for pyckmeans, I am not yet sure if that program outputs only one or several clustering solutions, in any case the user should see these on the GUI and be able to choose a clustering solution to be possibly added to the data. We here also need to have the option to produce and save a new data file (either tab or spart.xml) with the original sequences and metadata, plus the new partition information.

Window III This window would be about the treefile to be used for analysis. The user options should be:

  1. I want to proceed without phylogenetic information (no genetic distances of sister species and no patristic distances will be calculated)
  2. I want to proceed with the input tree
  3. I want to calculate a new tree from the sequence data using FastTree

These options should be self explanatory. If option 3 is selected, the program will use Fasttree on the (aligned) sequences to calculate a tree, and use this treefile for the final analyses.

Window IV This is then the final window for TaxI analyses as described previously. Different from what i previously suggested, pyckmeans will not be available here (but simple clustering is still available). We still need to add here the options from phylogenyanalyzer to output patristic distances, distances between sister species, and a coloured tree.