issues
search
internetarchive
/
iari
Import workflows for the Wikipedia Citations Database
GNU General Public License v3.0
12
stars
9
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
as a devop I want a systemd unit service for gunicorn that is enabled on startup so if the instance is restarted the service is restored automatically
#868
dpriskorn
closed
1 year ago
1
As a data consumer i want /check-url endpoint to not produce 'text' property in output
#867
mojomonger
closed
1 year ago
1
Use testdeadlink api also
#866
dpriskorn
closed
1 year ago
0
Test pdf debug output
#865
dpriskorn
closed
1 year ago
0
As a developer I want to test that the debug output is only present when debug=true
#864
dpriskorn
closed
1 year ago
0
More debug output
#863
dpriskorn
closed
1 year ago
0
as a customer I want a development instance of IARI so we can experiment there without breaking IARE
#862
dpriskorn
opened
1 year ago
6
as a customer I want to batch check urls using a POST request so I don't have to make a ton of http requests per article
#861
dpriskorn
opened
1 year ago
2
as a patron I want the links in the Responsible AI to be found and analyzed correctly by IARI so I can trust it
#860
dpriskorn
opened
1 year ago
1
Cleanup before release of 4.1.0
#859
dpriskorn
closed
1 year ago
0
as a customer I want the check-url to support using also the testdeadlink API by default so I can see if it solves the 4xx status code issues reliably
#858
dpriskorn
closed
1 year ago
2
as a customer/consumer/developer i want to see the debug output from the /pdf endpoint with as many PyMuPdf functions as we can so we can solve PDF link parsing problems
#857
mojomonger
closed
1 year ago
3
Rewrite link extraction from text
#856
dpriskorn
closed
1 year ago
0
as a developer I want to document that we don't store the debug information in the pdf endpoint so the customers don't get confused
#855
dpriskorn
closed
1 year ago
1
As a consumer i want the DeSantis pdf file to accurately show all the links in the document
#854
mojomonger
opened
1 year ago
2
As a customer I want the backend to try harder finding links from text by removing spaces also to catch more edge cases
#853
dpriskorn
closed
1 year ago
0
as a customer I want the pdf endpoint to also try to find links by parsing the xhtml generated by pymupdf
#852
dpriskorn
opened
1 year ago
0
as a customer I want the pdf endpoint to also output the xhtml generated by pymupdf when the debug parameter is specified
#851
dpriskorn
closed
1 year ago
1
Fix json encoding bug
#850
dpriskorn
closed
1 year ago
0
as a data consumer I want the pdf endpoint to resolve weird code points found in pdfs
#849
dpriskorn
opened
1 year ago
0
Support debug output
#848
dpriskorn
closed
1 year ago
0
as a patron I want the pdf endpoint to support fetching pdfs from commons
#847
dpriskorn
opened
1 year ago
0
as a developer I want to make sure we can fetch pdfs from the wayback machine also
#846
dpriskorn
opened
1 year ago
0
as a customer I want to pass a debug parameter and get the raw text and annotations so I can understand better why it didn't find the URLs
#845
dpriskorn
closed
1 year ago
0
as a patron I want the pdf endpoint to extract all urls from Global Connectivity Report so I can check their status
#844
mojomonger
opened
1 year ago
5
As a patron I want link extraction to support ftp also
#843
dpriskorn
opened
1 year ago
0
as a tool developer I want a new endpoint that I can send a chunk of wikitext for a citation and get a complete analysis back
#842
dpriskorn
opened
1 year ago
0
Fix forward slash bug
#841
dpriskorn
closed
1 year ago
0
fix slash in name extraction
#840
dpriskorn
closed
1 year ago
0
as a developer I want to parse "safe links" like these to avoid confusing the patron
#839
dpriskorn
opened
1 year ago
0
Fix fld counts
#838
dpriskorn
closed
1 year ago
0
as a developer I want to remove trailing forward slashes from the name of references
#837
dpriskorn
closed
1 year ago
0
as a developer I want more tests of the fld extraction to understand better when it fails
#836
dpriskorn
opened
1 year ago
0
Improve Reference class
#835
dpriskorn
closed
1 year ago
0
as a customer I want the domain counts for an article to reflect unique domains per reference to avoid confusion
#834
dpriskorn
closed
1 year ago
1
as a developer I want to update endpoint output documentation in the readme to include detected_language, url_objects and wikitext
#833
dpriskorn
opened
1 year ago
0
Add wikitext output on reference in article endpoint
#832
dpriskorn
closed
1 year ago
0
as a customer I want the wikitext of references from the article endpoint so I can get everything I need in one request to build a database of references
#831
dpriskorn
closed
1 year ago
0
Add language detection to 3 endpoints
#830
dpriskorn
closed
1 year ago
0
Cleanup repository
#829
dpriskorn
closed
1 year ago
0
as a patron I want to know if a url is valid or not according to IARI so I can go fix it if not
#828
dpriskorn
closed
1 year ago
1
Fix erroneous FLD and URL extraction
#827
dpriskorn
closed
1 year ago
0
Support revisions in article endpoint
#826
dpriskorn
closed
1 year ago
0
Support parsing Wayback Machine urls in the fld parser
#825
dpriskorn
closed
1 year ago
0
as a developer I want to remove all deprecated code related to Wikibase to have a lean and focused codebase
#824
dpriskorn
closed
1 year ago
0
As a developer I want to compile the regex used in the pdf endpoint to speed up the parsing
#823
dpriskorn
closed
1 year ago
1
As a developer I want to implement catching errors from the pdf parser and test using corrupted PDFs to make the API more stable
#822
dpriskorn
opened
1 year ago
1
As a developer I want to validate the URLs found in the pdf endpoint to make sure they are valid before returning them to the patron
#821
dpriskorn
closed
1 year ago
1
As a developer I want to move all example urls into the git repository so our customers can easily find it
#820
dpriskorn
closed
1 year ago
2
Have /references endpoint accept article URL
#819
harej
closed
1 year ago
6
Previous
Next