i2b2 / i2b2-workinggroup-transmart-etl

6 stars 2 forks source link

Text search is not working with OBSERVATION_BLOB column #1

Open nabuelrub opened 4 years ago

nabuelrub commented 4 years ago

We have moved from storing text notes in TVAL_CHAR to OBSERVATION_BLOB in the Observation_fact table. At first, The CLOB text search was working fine, then it started returning 0 for any search queries. Any thoughts on this odd behavior for the CLOB column?

snmurphy0 commented 4 years ago

Sometimes it does that when it is not finished (background) indexing.

Thanks, Shawn.

From: Noor notifications@github.com Sent: Thursday, February 13, 2020 1:34 PM To: i2b2/i2b2-transmart-etl i2b2-transmart-etl@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [i2b2/i2b2-transmart-etl] Text search is not working with OBSERVATION_BLOB column (#1)

    External Email - Use Caution

We have moved from storing text notes in TVAL_CHAR to OBSERVATION_BLOB in the Observation_fact table. At first, The CLOB text search was working fine, then it started returning 0 for any search queries. Any thoughts on this odd behavior for the CLOB column?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/i2b2/i2b2-transmart-etl/issues/1?email_source=notifications&email_token=ADHNRESSSCKYI6ZRLOYEVHDRCWHATA5CNFSM4KUYOQZKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4INLJCUA, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADHNRERQDNIAF3BBGF4M6BDRCWHATANCNFSM4KUYOQZA.

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

nabuelrub commented 4 years ago

Thank you, Shawn, for the reply. I don't think this is related to the background indexing, I verified the indexes and they are already built and our tables are static.

I did more investigation and this is what we found

Below is the i2b2 query which returns 0 :

select /*+ index(observation_fact fact_cnpt_pat_enct_idx) */ f.patient_num from BlueHerondata.observation_fact f where f.concept_cd IN (select concept_cd from BlueHerondata.concept_dimension where concept_path like '\i2b2\Reports\Visit Notes\Result Notes\Surgical Pathology\%') AND ( modifier_cd = '@' AND valtype_cd = 'B' AND contains(observation_blob,'graft') > 0 )

Two ways I got this working:

  1. If contains is replaced with DBMS_LOB.INSTR it returns the relevant records, which suggests the issue has to do with the text index.

  2. Change the index hint to force it to use the text index of the CLOB select /*+ index(f observation_text) */ f.patient_num from nightHerondata.observation_fact f where f.concept_cd IN (select concept_cd from nightHerondata.concept_dimension where concept_path like '\i2b2\Reports\Visit Notes\Result Notes\Surgical Pathology\%') AND ( modifier_cd = '@' AND valtype_cd = 'B' AND contains(observation_blob,'graft') > 0 ) Explain plan shows that the index hint is forcing to use the text index observation_text while in the original query the text index was not used. (Oracle)

We started using CLOB a year ago, and the search feature was working fine. We are concerned this happened when we increased the number of CLOB records from 500K to 65m. Do you know anyone using CLOB on such a large scale in I2B2?