srvarey / gbif-occurrencestore

Automatically exported from code.google.com/p/gbif-occurrencestore
0 stars 0 forks source link

Rank is null on TC and TN #13

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Many ranks are null on TC and TN which are not allowed

see the r512 which needs to be reverted when the issue is actually solved

Original issue reported on code.google.com by timrobertson100 on 22 Apr 2011 at 6:04

GoogleCodeExporter commented 9 years ago

Original comment by oliver.m...@gmail.com on 26 Apr 2011 at 8:55

GoogleCodeExporter commented 9 years ago
Not seeing any null ranks on TC.

340730 out of 6137571 ranks in TN are null (~6%). Investigating.

Original comment by lars.fra...@gmail.com on 26 Apr 2011 at 2:13

GoogleCodeExporter commented 9 years ago
340639 out of 6051987 Taxon Names coming from CLB have NULL ranks. That leaves 
91 other NULLs. 0 are coming from occ_taxon_name so the rest must be from 
typification_record.

Original comment by lars.fra...@gmail.com on 26 Apr 2011 at 2:25

GoogleCodeExporter commented 9 years ago
See:

http://code.google.com/p/gbif-ecat/source/detail?r=3735
http://code.google.com/p/gbif-ecat/source/detail?r=3736

No NULL ranks should come from CLB (clb3) now

Original comment by timrobertson100 on 26 Apr 2011 at 3:27

GoogleCodeExporter commented 9 years ago
NULL ranks from typification_record gone as well with r523

Original comment by lars.fra...@gmail.com on 26 Apr 2011 at 5:51

GoogleCodeExporter commented 9 years ago
Of 7.2 million names, I see 4.3m with null rank
Reopening this issue

select rank,count(1) from tim_rollover2_temp_normalized group by rank gives the 
following.  Note the 3 columns again suggesting a corrupt table.  Perhaps the 
delimiter change in the NormalizeTaxonomy needs rolled back?:

C       8668
C   2003488 1
C   3236611 1
C   5257628 1
C   5257963 1
C   5273992 1
C   5308109 1
C   5308444 1
C   5933889 1
C   5941511,5941512 1
C   5941590,5941591 1
C   5947566,5947567 1
C   5947577 1
C   7085610 1
C   7154496 1
C   7154533,7154532 1
C   876685  1
C   876731  1
F       199148
F   121257  1
F   122050  1
F   1286407 1
F   1508184 1
F   1533035 1
F   1538105 1
F   1538295 1
F   1538341 1
F   1663126 1
F   1663127 1
F   1663128 1
F   1663129 1
F   1663130 1
F   1663131 1
F   1663132 1
F   1663133 1
F   1663134 1
F   1663135 1
F   1663136 1
F   1663137 1
F   1663138 1
F   1663139 1
F   1663140 1
F   1663141 1
F   1663142 1
F   1663143 1
F   1663144 1
F   1663145 1
F   1663146 1
F   1663147 1
F   1663148 1
F   1663149 1
F   1663150 1
F   1663151 1
F   1663152 1
F   1663153 1
F   1663154 1
F   1663155 1
F   1663156 1
F   1663157 1
F   1663158 1
F   1663159 1
F   1663160 1
F   1663161 1
F   1663162 1
F   1663163 1
F   1663164 1
F   1663165 1
F   1663166 1
F   1663167 1
F   1663168 1
F   1663169 1
F   1663170 1
F   1663171 1
F   1663172 1
F   1663173 1
F   1663174 1
F   1663175 1
F   1663176 1
F   1663177 1
F   1663178 1
F   1663179 1
F   1663180 1
F   1663181 1
F   1663182 1
F   1663183 1
F   1663184 1
F   1663185 1
F   1663186 1
F   1663187 1
F   1663188 1
F   1663189 1
F   1663190 1
F   1663191 1
F   1663192 1
F   1663193 1
F   1663194 1
F   1663195 1
F   1663196 1
F   1663197 1
F   1663198 1

Original comment by timrobertson100 on 28 Apr 2011 at 8:22

GoogleCodeExporter commented 9 years ago
Please see in Hue the tim_rollover2_temp_normalized table.  It has an extra 
column of NULL at the end of each row.  This is likely to cause this issue:

data_resource_id    local_id    local_parent_id name    author  rank    denormalized_taxonomy
_ids
10001   1   NULL    Calyptrosphaera NULL    G       NULL

Original comment by timrobertson100 on 28 Apr 2011 at 8:26

GoogleCodeExporter commented 9 years ago
See the revert in r533 and r534

This is needed as the MR job uses the textoutputformat which uses the \t 
character

Tests running

Original comment by timrobertson100 on 28 Apr 2011 at 8:35

GoogleCodeExporter commented 9 years ago
Confirmed reverting fixes the OR names issue - marking as fixed again

Original comment by timrobertson100 on 29 Apr 2011 at 4:59