pypi / support

Issue tracker for support requests related to using https://pypi.org
95 stars 48 forks source link

File Limit Request: samba_sampler - 175 MB #3026

Closed tresoldi closed 1 year ago

tresoldi commented 1 year ago

Project URL

https://pypi.org/project/samba_sampler

Does this project already exist?

New Limit

175 MB

Update issue title

Which indexes

PyPI, TestPyPI

About the project

samba_sampler is an on-going project for providing better sampling methods in general, but particularly designed for linguistic typology. It attempts to address issues of vertical and spatial autocorrelation (i.e., Galton's problem [1]) and is already been used by two Ph.D. students at the University of Uppsala (Sweden) and will be submited for peer-reviewed publication in about a month.

[1] https://en.wikipedia.org/wiki/Galton%27s_problem

Reasons for the request

In order to be used without complex instructions for installation (as the target audience is not necessarily advanced in computer proficiency), I have decided to distribute the package with all necessary data for normal usage, thus including: (a) a custom dump of Glottolog's [2] data, (b) a pre-computed matrix of distance from GLED's "world tree" [3], (c) a pre-computed matrix of distance with Haversine distances, and (d) a pre-computed matrix of distance with walking distances, adapted from Guzman Naranjo & Jäger (2023) [4]. It is important to distribute this data both for computation speed and easy of access. The matrices are square matrices involving over 8,000 different language varieties, all with geographic coordinates.

In order to reduce the package size, I have implemented a custom class that uses Python's arrays instead of (pickled) lists, and set the datatype to unsigned integers; the files are also compressed with the highest protocol using bzip2.

Unfortunately, even with these measures I am now over the 100 Mb limit. I am requesting a new limit of 175 Mb, which should be more than enough to fit the package once all the matrices are integrated (I estimate the final size will be about 130 Mb).

[2] https://www.glottolog.org [3] https://doi.org/10.5281/zenodo.7368116 [4] https://doi.org/10.12688/openreseurope.16141.1 [5] https://github.com/tresoldi/samba_sampler/blob/main/src/samba_sampler/common.py

Code of Conduct

di commented 1 year ago

I've set the upload limit for samba_sampler to 200 MB on PyPI and TestPyPI. Please be mindful of the frequency of releases at that size.