Separate the whole data transformation as a different process

joohokim1 commented 6 years ago

Is your feature request related to a problem? Please describe. Since whole data transformation works in the same proces of the discovery process, if something like OOM happens, it can affect the entire discovery process.

현재 전체 data 변환 작업을 discovery process에서 진행하기 때문에, OOM 같은 것이 발생하면 discovery 전체에 영향을 줄 수 있습니다.

Describe the solution you'd like It would be better to move the whole data transformation process to another process.

전체 data 변환 작업을 다른 프로세스로 진행하는 것이 좋겠습니다.

Describe alternatives you've considered There is a way to predict the memory consumed by the data preparation and prevent it from being executed in advance, but it is not easy to predict. Perhaps the user would like to try as much as possible in a separate process.

Data preparation이 소비하는 자원(memory)을 예측해서 미리 실행을 방지하는 방법도 있지만, 예측하는 것이 쉽지도 않고, 아마 사용자는 별도 프로세스로 나오면 최대한 시도를 해보는 것을 원할 듯 합니다.

Additional context Just like Spark external engine, you can think of pure java external engine.

Spark external engine과 마찬가지로 pure java external engine이 돌아간다고 생각하면 됩니다.

babokim commented 6 years ago

이 기능이외에도 ingestion 관련 기능도 별도의 프로세스나 데몬으로 실행하는 것을 검토해보는 거는 어떤가요? 첨부파일 업로드 후 ingestion 하다 보면 메모리를 많이 사용하는 경우가 있는 것 같아서요.

Taehui commented 6 years ago

다운로드도 마찬가지 입니다. 함께 검토되면 좋을 것 같네요~

joohokim1 commented 6 years ago

@babokim @Taehui Thank you for your opinions. I think it'd bettter to work on dataprep snapshot for the first step, and then to proceed with the issue of ingestion and download. Of course, I think it is correct to proceed with reference to the preceding works since they are related to each other.

@babokim @Taehui 의견 감사합니다. 본 건은 일단 dataprep에 대해서 처리한 후에, ingestion과 download에 대한 이슈를 별건을 진행하는 것이 좋겠습니다. 물론, 서로 연관관계가 있고, 선행되는 작업을 참고해서 진행하는 것이 맞다고 생각합니다.

joohokim1 commented 6 years ago

The reason I want to proceed like this is because the task of separating the package into different artifacts is quite delicate and quite a task to do.

아 위처럼 진행하고자 하는 이유는, package를 다른 artifact로 분리하고 하는 작업들이 꽤 delicate하고, 할 일도 꽤 있는 작업이기 때문입니다.

joohokim1 commented 6 years ago

This issued cannot be done in 3.0.6. Moving to 3.0.7.

metatron-app / metatron-discovery

Separate the whole data transformation as a different process #373