machinalis / iepy

Information Extraction in Python
BSD 3-Clause "New" or "Revised" License
905 stars 186 forks source link

Does not update preprocess and tables when content is updated. #114

Closed SindhuBairavi closed 7 years ago

SindhuBairavi commented 7 years ago

In the event that the content is updated, only the text field in the db is updated. If i run preprocess again, it is not redo the preprocessing, so text is modified but the preprocess content is not and subsequent tables are not either. Even if i delete the record and reload the modified content as a new record, the relative tables don't get updated. How to handle updated content?

jmansilla commented 7 years ago

Hi!

Current goals of IEPY does not include such a feature of automatic re-preprocess of modified documents

You may add it by defining a django post-save handler for IEDocuments, and defining in there the preprocess steps that need to be re-run. If you have the time and the desire of doing it, go ahead and we can try to guide you.

Some pointers:

pipeline = PreProcessPipeline([
    ProcessStepA(override=True),
    ProcessStepB(override=True),
    ....
    ProcessStepN(),
], document)
pipeline.process_everything()
SindhuBairavi commented 7 years ago

Hi, Would this post-save also help in updating the relative tables involved? I can try adding it, but I need help in understanding the current scripts.

jmansilla commented 7 years ago

Sindhu:

For sure that with those post-save we/you should be able to modify whatever it's needed to be updated.

Then, I'm not sure if I'm understanding what you mean.

Reviewing from your original post, you said "Even if i delete the record and reload the modified content as a new record, the relative tables don't get updated" and I'm wondering... what do you mean? Maybe I didn't read you right.

Can you explain to me what those "relative tables" are? What information do they store, and what changes would you want to apply to them.

On Tue, Nov 15, 2016 at 7:51 AM, Sindhu Bairavi notifications@github.com wrote:

Hi, Would this post-save also help in updating the relative tables involved? I can try adding it, but I need help in understanding the current scripts.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/machinalis/iepy/issues/114#issuecomment-260610021, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd04yMkw3k6olcg7HUb5WLKtJ8n-Ratks5q-Y6ngaJpZM4KveAd .

Javier Mansilla - Technical Leader www.machinalis.com

SindhuBairavi commented 7 years ago

Hi, What I meant is, if I updated the content in the same record, the other tables like entities and segments and others do not get updated. So those tables retain the original content's data. If I proceed to delete and then reload the record, I will need to manually check the related tables and update/delete those as well. Do let me know if I need to elaborate more.

jmansilla commented 7 years ago

Ok. As explained, no, iepy is not re running things when content is changed.

But, removing a document and later re-inserting it and later re-running preprocess should do the trick. Segments, EntitiesOccurrences, and those related objects are removed when a document is removed.

How were you deleting Documents?

SindhuBairavi commented 7 years ago

Oh.. Can you tell me how to remove a document? I've been trying to modify with the database directly using delete statement, which i'm assuming is incorrect.

Can i remove a document from script instead of UI?

jmansilla commented 7 years ago

Yes, you can remove Documents from the UI.

If you start the webserver

python bin/manage.py runserver

you can later access and remove your documents from here http://127.0.0.1:8000/admin/corpus/iedocument/ (it's your local webserver address)