jzillmann / pdf-to-markdown

A PDF to Markdown converter
https://pdf2md.morethan.io
MIT License
1.14k stars 184 forks source link

Read PDFs from URL option #25

Closed jzillmann closed 3 years ago

jzillmann commented 3 years ago

Currently we can :

  1. drop or browse PDFs
  2. Open the Example.pdf

Would be nice to have a 3rd option where one can enter a URL.

PDF.js already can source from a URL (we're doing it for (2) already), so this should be purely UI.

darkcheftar commented 3 years ago

Sire, @jzillmann I love contributing to this issue. But I haven't made any contribution to any repository till now, But I'm keen to learn. Could you please be able to help me out a little? Regards @darkcheftar.

jzillmann commented 3 years ago

Hey @darkcheftar,

great! Process wise I would expect you to open a pull request for this particular issue. Let me know if you need more guidance there. Code wise the interesting places are:

darkcheftar commented 3 years ago

Thank you, Sire @jzillmann I will check into it and let you know what I'm up to soon!

darkcheftar commented 3 years ago

Sire @jzillmann In solving #25 issue in order to test proof of concept that on passing a URL to parsePdf function it loads pdf. I replaced 'Example.pdf' with This URL as shown below. And Clicked the option load Example.

export async function loadExample(progressListener: ProgressListenFunction): Promise<any> {
-    return parsePdf('ExamplePdf.pdf', progressListener);
+   return parsePdf('https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf', progressListener);
}

I got the following error. image

please help me with this.

jzillmann commented 3 years ago

Ok, so the CORS thingy... 😞 Guess this is kind of expected, see also https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-xhr

Not sure but I guess the only reliable way to enable this feature is to use a proxy. See https://github.com/mozilla/pdf.js/issues/1000#issuecomment-133756244 Can you try that ?

darkcheftar commented 3 years ago

Sure, I will definitely try checking thisπŸ˜„. Thanks for the support.

darkcheftar commented 3 years ago

sire @jzillmann please check #27 I tried to fix #25 Let me know if there are any suggestions or changes. πŸ€—

jzillmann commented 3 years ago

Thanks @darkcheftar looks good, added some minor comments on the PR! Also I apologize for the setup mess. When checking out your fork I recognized a lot of inconveniences (error in Visual Code, missing dependencies. etc...)... My plan was to to a project setup cleanup near completion to the modularization (i.e. core should be a published NPM module), but I guess it meaningful to keep things a bit more straighter until then! So let me know if something isn't working for you with the initial setup!

darkcheftar commented 3 years ago

Thanks, sire @jzillmann. I hope you are safe and sound. I will try to answer those minor comments as soon as possible. πŸ˜ƒ

darkcheftar commented 3 years ago

Thanks @darkcheftar looks good, added some minor comments on the PR! Also I apologize for the setup mess. When checking out your fork I recognized a lot of inconveniences (error in Visual Code, missing dependencies. etc...)... My plan was to to a project setup cleanup near completion to the modularization (i.e. core should be a published NPM module), but I guess it meaningful to keep things a bit more straighter until then! So let me know if something isn't working for you with the initial setup!

Got it, Sire! While bringing a great change we need to go through a bit of suffering.

jzillmann commented 3 years ago

Changes are in, big thanks! πŸ‘πŸ˜ƒ

darkcheftar commented 1 year ago

Hey @jzillmann, As You I used Heroku for this feature. and as we know they revoked free tier. This feature needs some attention again.