thibautjombart / adegenet

adegenet: a R package for the multivariate analysis of genetic markers
169 stars 64 forks source link

nancycats file data inconsistent with .rda #280

Open pdimens opened 4 years ago

pdimens commented 4 years ago

The nancycats.gtx (and maybe .dat? or .gen) file in the repo do not align with the individual/population declarations when calling nancycats via data("nancycats").

Specifically, what I have found is that individuals N182 N183 N184 N185 N186 N269 N270 are assigned to their own population in the .gtx, but assigned to P012 in the .rda. .rda: image

.gtx (slight modification on my end): image

The .gen file does not include individual names, therefore it's unclear if the inconsistency occurs there too. I'm assuming the .rda version is the correct one.

romunov commented 4 years ago

I think you're seeing this because in .gtx, two populations have the same number/name. I see

12
7
      N182 000000 136146 139145 126132 156156 142150 193199 113113 208208
      N183 149149 136146 139145 126132 156156 142142 199199 103113 208208
      N184 000000 136136 139145 126132 156156 150150 199199 103113 208208
      N185 149149 146146 139141 126126 144158 142142 193195 113113 214214
      N186 000000 136146 139141 126128 144156 142150 193193 113113 206214
      N269 123143 130150 139145 122126 150154 142150 191191 103113 206220
      N270 123137 140144 139139 122126 150150 142148 193193 113113 208218

and

12
7
      N134 135135 142146 139145 126126 150156 142142 185199 091113 208208
      N135 129145 130140 137137 116126 158158 142148 193199 113113 208208
      N136 129141 136136 145145 126126 144150 142142 193193 113113 182216
      N137 137141 136146 143143 126126 158158 148148 189199 113113 182216
      N138 129135 136146 137141 126126 144150 142142 193199 113117 182208
      N139 129137 136142 137137 126126 150160 142142 193195 113113 208216
      N140 129141 136136 139145 120126 144160 142150 193193 113113 182182

which is reflected in the nancycats dataset imported via the data mechanism.

library(adegenet)
data(nancycats)

xy <- data.frame(inds = rownames(tab(nancycats)),
                 pop = pop(nancycats)
)

xy[xy$pop == "P12", ]

    inds pop
158 N134 P12
159 N135 P12
160 N136 P12
161 N137 P12
162 N138 P12
163 N139 P12
164 N140 P12
218 N182 P12
219 N183 P12
220 N184 P12
221 N185 P12
222 N186 P12
223 N269 P12
224 N270 P12

I hope @thibautjombart can chip in. Also, we (I) could pretty up the nancycats.gtx file to be more informative. I.e. the first entry probably represents number of loci, not E:\tibo\THESE\chatsNancy\nancy.gtx. :)