mapleforest / HaploMerger2

40 stars 6 forks source link

Inferring score matrix for a given genome sequence #1

Open danshu opened 7 years ago

danshu commented 7 years ago

Hi,

I'm trying to infer score matrix for my genome. I have two questions:

  1. does the genome has to be soft-masked to be used for inferring score matrix?
  2. In the manual: "To divide the genome sequence file into two parts one contains 5~15% sequences in size, and the other contains the rest of the genome sequences. The use of >10% genome sequence is not recommended, because the chance of finding no allele for a scaffold is high and hence the inference is less reliable." So does this mean that the first part should contain 5-10% sequences in sizes?

Many thanks, Danshu

mapleforest commented 7 years ago

Lastz require a soft masked genome to run fast, but for score inference, a hard-masked genome is also OK. You can used the longest 10% to align against the rest 90%, this is how it is supposed to work. If you are interested, you can test 5% vs 95%, 10% vs 90%, 15% vs 90%, and you can test with --identify=90..100, 85..100, 90..99, 85..99, and then choose the suitable one.