yasserg / crawler4j

Open Source Web Crawler for Java
Apache License 2.0
4.53k stars 1.93k forks source link

How to use post to submit data. #342

Open chunfytseng opened 6 years ago

chunfytseng commented 6 years ago

I need to submit data using post.

shalipoto commented 6 years ago

Hi, I saw @jahenkor doing a POST with the crawler. You might ask him how he did that

chunfytseng commented 6 years ago

thx.

shalipoto commented 6 years ago

@chunfytseng , when you are happy with the answers, can you close this issue please?

chunfytseng commented 6 years ago

hi,so far, I still don't know how to get post data, so I need your help.

jahenkor commented 6 years ago

If you need to login to a page, you can submit your post request using AuthInfo (BasicAuth or FormAuth), and receive cookie(s) in which you can persistent across your crawl session. However, if you want to submit and receive post data across multiple pages, I've got no clue.

jahenkor commented 6 years ago

@chunfytseng

chunfytseng commented 6 years ago

@jahenkor hi, jahenkor I have already used AuthInfo to login, and after logging in, I need to submit data using post.

chunfytseng commented 6 years ago

@Chaiavi

chunfytseng commented 6 years ago

please!!!

Chaiavi commented 6 years ago

I don't understand your scenario, what are you trying to do ?

On Sun, Sep 16, 2018 at 6:27 AM chunfytseng notifications@github.com wrote:

please!!!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yasserg/crawler4j/issues/342#issuecomment-421681642, or mute the thread https://github.com/notifications/unsubscribe-auth/ABrbW4FkW55h35c9W2Pq96QoD7Aol4UOks5ubcUhgaJpZM4WZgK4 .

chunfytseng commented 6 years ago

I don't understand your scenario, what are you trying to do ? On Sun, Sep 16, 2018 at 6:27 AM chunfytseng @.***> wrote: please!!! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#342 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABrbW4FkW55h35c9W2Pq96QoD7Aol4UOks5ubcUhgaJpZM4WZgK4 .

There are scenes as follows:

  1. Login system
  2. Get information
  3. post submit data to obtain the results.
  4. There are two results in step 3. If the successful task ends, if it fails, continue to step 3.. The above is my usage scenario. Now, I have been logged in successfully, and how to use post to submit data.
chunfytseng commented 6 years ago

@Chaiavi

Chaiavi commented 6 years ago

We don't support failed logins.

So if the login fails you won't get any data from the site as you are not logged into it.

The authentication component in c4j is very basic and not sophisticated, it was kind of a later hack added to the crawler, so it is not so flexible.

You can try to hack the auth system to do as you wish but as it is coded now I don't see a way to try logging into a site, parsing the response so according to it you will be able to login with a different user/pass combination.

Chaiavi.

On Sun, Sep 16, 2018 at 6:57 PM chunfytseng notifications@github.com wrote:

@Chaiavi https://github.com/Chaiavi

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yasserg/crawler4j/issues/342#issuecomment-421787076, or mute the thread https://github.com/notifications/unsubscribe-auth/ABrbW8zHdigqj5jZoi2NhQCokKblZBILks5ubnTtgaJpZM4WZgK4 .

chunfytseng commented 6 years ago

hi, Chaiavi.I have 2 questions. 1.I have solved the problem of logon failure. Now, there is a question of using post to get data. 2.If my crawler task is long-running and crawls data with multi-user logins (I deal with multi-user logins myself), would you recommend that I start multiple PageFetcher instances and give different configs? Or do you have any better advice? @Chaiavi

Chaiavi commented 6 years ago

No better advice.

Just be sure you don't mix sessions of different users. Login with one user, crawl, logoff Login with a different user etc...

Chaiavi.

On Tue, Sep 18, 2018 at 4:54 PM chunfytseng notifications@github.com wrote:

hi, Chaiavi.I have 2 questions. 1.I have solved the problem of logon failure. Now, there is a question of using post to get data. 2.If my crawler task is long-running and crawls data with multi-user logins (I deal with multi-user logins myself), would you recommend that I start multiple PageFetcher instances and give different configs? Or do you have any better advice? @Chaiavi https://github.com/Chaiavi

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yasserg/crawler4j/issues/342#issuecomment-422402466, or mute the thread https://github.com/notifications/unsubscribe-auth/ABrbW-LRdNYyQo9p9ML-SAXCaCrOWwioks5ucPswgaJpZM4WZgK4 .

chunfytseng commented 6 years ago

thx.