identifiers-org / identifiers-org.github.io

MIT License
8 stars 1 forks source link

Regex RefSeq #176

Closed JosuaCarl closed 1 year ago

JosuaCarl commented 2 years ago

The regular expression patter for RefSeq prefixes, seems to be outdated. On the NCBI Website, the following entry can be found:

iron-containing alcohol dehydrogenase [Finegoldia magna] NCBI Reference Sequence: WP_012290939.1<

However, the RefSeq-ID WP012290939.1 gets filtered out by the regular expression pattern, because WP is not included as a valid prefix. The current regex `>^(((AC|AP|NC|NG|NM|NP|NR|NT|NW|XM|XP|XR|YP|ZP)\d+)|(NZ_[A-Z]{2,4}\d+))(.\d+)?$''` should therefore include |WP or change the beginning to [A-Z]{2}.

renatocjn commented 1 year ago

I have added the WP prefix to the current regex. Thank you for bringing the issue to us.