pinskylab / genomics

Wrangling of genomic data and identity analysis
3 stars 2 forks source link

how to remove recaptured fish from genepop #19

Closed mstuart1 closed 5 years ago

mstuart1 commented 5 years ago

In the case of recaptured fish, we observed and measured a fish in a specific year and then captured it again in a different year, where it potentially was a different size, color, sex.

If I remove all but one observation of the fish, it will make it look like that fish was only ever captured in one year and was only ever one size.

This seems like a loss of valuable information, and the loss of potential offspring or parents because it can only be in one pool.

How do we make sure that a fish that was a juvenile in 2012 but a multi-captured breeding adult in 2017 is in the correct pool for each analysis?

katcatalano commented 5 years ago

To be clear, this is removing regenotypes from only the genepop? If so, I have a script that pulls gen_ids for each ligation_id and then assigns fish to parents/offspring files based on the metadata for the ligation_id. So all we need is to have each fish represented by their best-sequenced liation_id in the genepop. Does that make sense? I saw you closed issue 11, can you please comment on that issue with the final number of loci you were able to get? Thanks!

On Tuesday, April 16, 2019, Michelle Stuart notifications@github.com wrote:

In the case of recaptured fish, we observed and measured a fish in a specific year and then captured it again in a different year, where it potentially was a different size, color, sex.

If I remove all but one observation of the fish, it will make it look like that fish was only ever captured in one year and was only ever one size.

This seems like a loss of valuable information, and the loss of potential offspring or parents because it can only be in one pool.

How do we make sure that a fish that was a juvenile in 2012 but a multi-captured breeding adult in 2017 is in the correct pool for each analysis?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pinskylab/genomics/issues/19, or mute the thread https://github.com/notifications/unsubscribe-auth/AY8C-a8xC3gfS4T3DVyPUjXfvffwbdPRks5vhefjgaJpZM4cyfMb .

mstuart1 commented 5 years ago

Stopping for the night at figuring out how to identify which gen_id is repeated, which has the most loci, and how to keep only the gen_id with the most loci. Need to pull from the regeno code.

katcatalano commented 5 years ago

I have a script that does this (summing all the occurrences of “00” for an allele and then choosing the ligation_id with the fewest “00”s. If that’s all that’s left, I’m fine leaving those in because it can be taken care of by the colony prep script I use. Thanks!

On Tuesday, April 16, 2019, Michelle Stuart notifications@github.com wrote:

Stopping for the night at figuring out how to identify which gen_id is repeated, which has the most loci, and how to keep only the gen_id with the most loci. Need to pull from the regeno code.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/pinskylab/genomics/issues/19#issuecomment-483850793, or mute the thread https://github.com/notifications/unsubscribe-auth/AY8C-fZZQP5Ol_W3c3oyHmIawOBECV5Nks5vhkCpgaJpZM4cyfMb .

mstuart1 commented 5 years ago

There is a single line in dplyr that does this, I just needed to make a note for myself at the end of the day as to where I left off.

mstuart1 commented 5 years ago

Pausing to deal with genomics #10 .

mstuart1 commented 5 years ago

Recaptured fish have been removed from genepop and it has been written to file. genepop vcf