kr-colab / FILET

Software for detecting introgression using supervised machine learning
GNU General Public License v3.0
18 stars 5 forks source link

Mask File Example Format #4

Closed stsmall closed 6 years ago

stsmall commented 6 years ago

Hi @andrewkern, @dschride, I have already masked repeats and low-confidence calls in my Fasta files and would like to also include a mask file with the training data. I am a little unclear as to the format of this file, would you be able to provide a brief example please. thanks, @stsmall

stsmall commented 6 years ago

@dschride answered my question in a separate email. I am including the answer here:

cat "simulationInputFile" | msMaskAllRows "maskFileName" | python removeNedOutColumnsFromMsFile.py stdin > "outputFile"

Where simulationInputFile is the path your .ms-style formatted simulation input, and maskFileName is the path to your mask file that you are inquiring about. The masked version will be written to outputFile. I will illustrate the format to the maskFile by example:

"0 0.1 0.2 0 0.3 0.4 // 0 0.05 0.06 0 0.5 0.9 0 0.95 0.99 //"

The above file can be used to mask a simulation file with two replicates. In the first rep, all positions between 0.1 and 0.2 and between 0.3 and 0.4 will be masked (the 0 at the beginning of each line can be ignored). In the second rep (after the first //) we will mask three different stretches of sequence. Note that the lines with numbers are space-separated.