internetarchive iari issues

internetarchive / iari

Import workflows for the Wikipedia Citations Database

GNU General Public License v3.0

12 stars 9 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

as a devop I want a systemd unit service for gunicorn that is enabled on startup so if the instance is restarted the service is restored automatically

#868 dpriskorn closed 1 year ago
1
As a data consumer i want /check-url endpoint to not produce 'text' property in output

#867 mojomonger closed 1 year ago
1
Use testdeadlink api also

#866 dpriskorn closed 1 year ago
0
Test pdf debug output

#865 dpriskorn closed 1 year ago
0
As a developer I want to test that the debug output is only present when debug=true

#864 dpriskorn closed 1 year ago
0
More debug output

#863 dpriskorn closed 1 year ago
0
as a customer I want a development instance of IARI so we can experiment there without breaking IARE

#862 dpriskorn opened 1 year ago
6
as a customer I want to batch check urls using a POST request so I don't have to make a ton of http requests per article

#861 dpriskorn opened 1 year ago
2
as a patron I want the links in the Responsible AI to be found and analyzed correctly by IARI so I can trust it

#860 dpriskorn opened 1 year ago
1
Cleanup before release of 4.1.0

#859 dpriskorn closed 1 year ago
0
as a customer I want the check-url to support using also the testdeadlink API by default so I can see if it solves the 4xx status code issues reliably

#858 dpriskorn closed 1 year ago
2
as a customer/consumer/developer i want to see the debug output from the /pdf endpoint with as many PyMuPdf functions as we can so we can solve PDF link parsing problems

#857 mojomonger closed 1 year ago
3
Rewrite link extraction from text

#856 dpriskorn closed 1 year ago
0
as a developer I want to document that we don't store the debug information in the pdf endpoint so the customers don't get confused

#855 dpriskorn closed 1 year ago
1
As a consumer i want the DeSantis pdf file to accurately show all the links in the document

#854 mojomonger opened 1 year ago
2
As a customer I want the backend to try harder finding links from text by removing spaces also to catch more edge cases

#853 dpriskorn closed 1 year ago
0
as a customer I want the pdf endpoint to also try to find links by parsing the xhtml generated by pymupdf

#852 dpriskorn opened 1 year ago
0
as a customer I want the pdf endpoint to also output the xhtml generated by pymupdf when the debug parameter is specified

#851 dpriskorn closed 1 year ago
1
Fix json encoding bug

#850 dpriskorn closed 1 year ago
0
as a data consumer I want the pdf endpoint to resolve weird code points found in pdfs

#849 dpriskorn opened 1 year ago
0
Support debug output

#848 dpriskorn closed 1 year ago
0
as a patron I want the pdf endpoint to support fetching pdfs from commons

#847 dpriskorn opened 1 year ago
0
as a developer I want to make sure we can fetch pdfs from the wayback machine also

#846 dpriskorn opened 1 year ago
0
as a customer I want to pass a debug parameter and get the raw text and annotations so I can understand better why it didn't find the URLs

#845 dpriskorn closed 1 year ago
0
as a patron I want the pdf endpoint to extract all urls from Global Connectivity Report so I can check their status

#844 mojomonger opened 1 year ago
5
As a patron I want link extraction to support ftp also

#843 dpriskorn opened 1 year ago
0
as a tool developer I want a new endpoint that I can send a chunk of wikitext for a citation and get a complete analysis back

#842 dpriskorn opened 1 year ago
0
Fix forward slash bug

#841 dpriskorn closed 1 year ago
0
fix slash in name extraction

#840 dpriskorn closed 1 year ago
0
as a developer I want to parse "safe links" like these to avoid confusing the patron

#839 dpriskorn opened 1 year ago
0
Fix fld counts

#838 dpriskorn closed 1 year ago
0
as a developer I want to remove trailing forward slashes from the name of references

#837 dpriskorn closed 1 year ago
0
as a developer I want more tests of the fld extraction to understand better when it fails

#836 dpriskorn opened 1 year ago
0
Improve Reference class

#835 dpriskorn closed 1 year ago
0
as a customer I want the domain counts for an article to reflect unique domains per reference to avoid confusion

#834 dpriskorn closed 1 year ago
1
as a developer I want to update endpoint output documentation in the readme to include detected_language, url_objects and wikitext

#833 dpriskorn opened 1 year ago
0
Add wikitext output on reference in article endpoint

#832 dpriskorn closed 1 year ago
0
as a customer I want the wikitext of references from the article endpoint so I can get everything I need in one request to build a database of references

#831 dpriskorn closed 1 year ago
0
Add language detection to 3 endpoints

#830 dpriskorn closed 1 year ago
0
Cleanup repository

#829 dpriskorn closed 1 year ago
0
as a patron I want to know if a url is valid or not according to IARI so I can go fix it if not

#828 dpriskorn closed 1 year ago
1
Fix erroneous FLD and URL extraction

#827 dpriskorn closed 1 year ago
0
Support revisions in article endpoint

#826 dpriskorn closed 1 year ago
0
Support parsing Wayback Machine urls in the fld parser

#825 dpriskorn closed 1 year ago
0
as a developer I want to remove all deprecated code related to Wikibase to have a lean and focused codebase

#824 dpriskorn closed 1 year ago
0
As a developer I want to compile the regex used in the pdf endpoint to speed up the parsing

#823 dpriskorn closed 1 year ago
1
As a developer I want to implement catching errors from the pdf parser and test using corrupted PDFs to make the API more stable

#822 dpriskorn opened 1 year ago
1
As a developer I want to validate the URLs found in the pdf endpoint to make sure they are valid before returning them to the patron

#821 dpriskorn closed 1 year ago
1
As a developer I want to move all example urls into the git repository so our customers can easily find it

#820 dpriskorn closed 1 year ago
2
Have /references endpoint accept article URL

#819 harej closed 1 year ago
6

Previous Next