Closed victorlin closed 3 days ago
Aaand of course this is tied to https://github.com/nextstrain/fauna/issues/113...
Trial run of our flu_upload with Python 3.10 runs into error:
ImportError: Pandas requires version '2.0.1' or newer of 'xlrd' (version '1.2.0' currently installed).
I was able to install packages without any issues because xlrd
is an optional dependency that is only checked with pandas[excel]
. Installing with pandas[excel] >=1.1.5, <3
and xlrd >=1.0.0, ==1.*
keeps pandas at v1.5.3. because the extra excel
was only added in v2.
Upgrading xlrd
to v2 shouldn't be an issue for sequence uploads since GISAID exports .xls
files.
However, titer data come as .xlsx
files, which does require xlrd
v1 with Python 3.7
We can make this version difference explicit with environment markers:
diff --git a/requirements.txt b/requirements.txt
index 5694eed..5964ad0 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,8 +1,10 @@
biopython >=1.73, ==1.*
boto >=2.38, ==2.*
certifi
-pandas >=1.1.5, ==1.*
+pandas[excel] >=2.0.0, <3; python_version>="3.8"
rethinkdb >=2.4.8, ==2.4.*, !=2.4.10
requests >=2.20.0, ==2.*
unidecode >=1.0.22, ==1.*
-xlrd >=1.0.0, ==1.*
+# Use Python 3.7 for tbd uploading scripts
+pandas >=1.1.5, ==1.*; python_version<"3.8"
+xlrd >=1.0.0, ==1.*; python_version<"3.8"
Depending on how many issues we run into with replacing use of xlrd
with excelrd
(as suggested in https://github.com/nextstrain/fauna/issues/113#issuecomment-1040878351), we can also change the dependencies to
diff --git a/requirements.txt b/requirements.txt
index 5694eed..7cc1d7d 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,8 +1,8 @@
biopython >=1.73, ==1.*
boto >=2.38, ==2.*
certifi
-pandas >=1.1.5, ==1.*
+pandas[excel] >=2.0.0, <3
rethinkdb >=2.4.8, ==2.4.*, !=2.4.10
requests >=2.20.0, ==2.*
unidecode >=1.0.22, ==1.*
-xlrd >=1.0.0, ==1.*
+excelrd >=3.0.0, <4
Thanks for digging into this @joverlee521! Stepping back a bit, a couple points:
The Nextstrain runtimes don't need to have pandas v2 any time soon.
Gotcha! Didn't want fauna to be a blocker here...
Is it worth considering decoupling Fauna from the Nextstrain runtimes (e.g. using an extended version of the base image when working with Fauna)?
Yes! It might makes sense to move it out of base image since it is only used by seasonal-flu and avian-flu. It should be straightforward to point the automated GH Action workflows to use the extended image. However, it might be extra burden on users to remember to use the correct image when running things manually.
Decoupling seems like the better option, I will write up a separate issue and link it soon.
Feel free to close this one as not planned.
In the process of nextstrain/augur#1473, it was noticed that pandas dependency version constraint in this repo is preventing Nextstrain runtimes from resolving to pandas version 2.
https://github.com/nextstrain/fauna/blob/d632af090c19f2601d99223a6c23047858a82a15/requirements.txt#L4
It's possible that the code is already compatible with pandas v2, in which case the fix is as simple as changing that line to
pandas >=1.1.5, <3
(also mentioned in https://github.com/nextstrain/augur/issues/1473#issuecomment-2475038728).Here are some resources for the migration: