mdsol / rwslib

Provide a (programmer) friendly client library to Rave Web Services (RWS).
MIT License
31 stars 13 forks source link

Double quotes escaped when fetching CSV metadata #126

Open pscohy opened 3 years ago

pscohy commented 3 years ago

Hello,

We are using your nice library to fetch the data and metadata from Rave. When we fetch the metadata in CSV format, the double quote seems to be escaped with a back-tick, leading to some error on our side since we are checking that the projectname is the same in the response than in the request.

We receive something like: projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel "{projectName}","{ViewName}","{ordinal}","{varname}","{vartype}","{varlength}","{varformat}","{varlabel}" ...

Did you already do something in the Library to solve this issue ? Or do you know what is the best way to solve this issue? I saw in the Rave documentation that someone had the same issue, so maybe if it is a common issue you have done something for this.

Already many thanks for the time you'll spent to answer my questions.

Have a nice day

PS: it appears only with the metadata, when fetching the clinical data, the double quote is not escaped with a back-tick

isparks commented 3 years ago

Hi @pscohy, I am not sure I am getting the same result as you when I test. I created a project called "ESCAPE/PROJECT" - the name includes the quotes and the /. When I ask for the Clinical View columns metadata with:

from rwslib import RWSConnection
from rwslib.rws_requests.biostats_gateway import ProjectMetaDataRequest
r = RWSConnection(host, username, password)
csv_meta = r.send_request(ProjectMetaDataRequest('"ESCAPE/PROJECT"'))

print(csv_meta)

I get this:

projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel
"""ESCAPE/PROJECT""","V__ESCAPE_PROJECT__DOV","1","userid","num","8","10.","Internal id for the user"
"""ESCAPE/PROJECT""","V__ESCAPE_PROJECT__DOV","2","projectid","num","8","10.","projectid"
"""ESCAPE/PROJECT""","V__ESCAPE_PROJECT__DOV","3","project","char","255","$255.","project"
...

So I do not get back-ticks, but I do get expansion of quotes. Are you using different quotes like ' ?

pscohy commented 3 years ago

Hi @isparks ,

Thanks for your quick answer 😊

It is something strange that we experience with only one study. The thing that is different from the other is that it is a Japanese study. Can it be related?

Also, we've found this in the documentation, do you know if there is something on going to investigate/fix the issue? documentation where the issue was raised: https://learn.mdsol.com/api/rws/retrieve-clinical-view-form-datasets-as-csv-95587309.html

Screenshot 2021-08-13 at 10 36 01
isparks commented 3 years ago

Hi @pscohy - it seems likely to be related but when using that endpoint which is for clinical data I get the same kind of escaping. I also tried to enter some clinical data that might need escaping in the output to see if the same concept is being applied. I got output like this:

.......,"","","""DOUBLE QUOTED""","",""
.......,"","","'SINGLE QUOTED'","",""
.......,"","","`Back quoted`","",""
.......,"","","With \ Backslash","",""
.......,"","","Comma and Quote ,""","",""
.......,"","","Quote and Comma "",","",""

So far, was not able to get a backtick to appear in the quoting.

I haven't worked at Medidata for more than 5 years at the time of writing so I can't say if there is any ongoing investigation. After 1 year (from May 28, 2020) it seems unlikely that any more investigation is happening on this? But you could ask again on learn.mdsol.com.

Can you share the name of the study/project? Is it named using only ASCII characters or does it include Japanese characters?

My thinking is that it must be something related to the study name - what special characters does the study name include?

pscohy commented 3 years ago

@isparks

I will not be able to share with you the name of the study since it is a study of one of our client and we are not allowed to share this kind of information outside our company. But I can say that the name of the project itself is in ascii not japanese. What is in japanese are the values they entered on the platform, all the structure is defined in english

What I can tell is that there are indeed japanese character in their clinical data.

To be able to fetch their data, we had to change the default encoding to use UTF-8 and use the parameter unicode=true (when we fetch the audit records).

What seems strange is that the back-tick and double quote only appears in the metadata that does not contain any japanese character. We don't have the back-tick and double quote in the response of the clinical data that contains japanese character.

Also, it seems that the unicode=true parameter is not handled in this call , because I only receive the headers and not the content {hostName}/RaveWebServices/datasets/ClinicalViewMetadata.csv?ProjectName={projectName} (meaning with this method: ProjectMetaDataRequest(project_name)). In the code it looks like this def get_metadata_csv(self, trialname): return self._send_request( ProjectMetaDataRequest(trialname, dataset_format="csv") )

isparks commented 3 years ago

@pscohy I understand about not sharing the name and I understand it is all in ASCII. So are there any characters in the name which are not in the A-Z range? Any punctuation like /,*`"'~ etc?

When you make a call like this for a project that doesn't exist:

r = RWSConnection(host, username, password)
csv_meta = r.send_request(ProjectMetaDataRequest('NO SUCH PROJECT'))

you get results like this:

projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel
EOF

This is just how that endpoint works for RWS. You can check the last request made like:

r = RWSConnection(host, username, password)
csv_meta = r.send_request(ProjectMetaDataRequest('NO SUCH PROJECT'))
print(r.last_result.url) # <-- What URL did you actually call?

And check it to make sure it is the correct URL

pscohy commented 3 years ago

Thanks for the last explanation, it makes sense

For the name of the study, it looks like this X111111-XX111 (where X is an alpha char and 1 is a numeric char).

We have other customer with study name like 1111-XX-1111 and we don't have this issue, so I doubt it is related to the name of the study, but it could be

isparks commented 3 years ago

Well, it continues to be very mysterious but to confirm, the extra ` only appears in the project name column like:

projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel
"X111111-XX111`","V__X111111_XX111__DOV","1","userid","num","8","10.","Internal id for the user"

or does it appear in all the columns?

pscohy commented 3 years ago

In all columns like

projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel `"X111111-XX111`",`"prod.V_X111111_XX111_AE`",`"1`",`"userid`",`"num`",`"8`",`"10.`",`"Internal id for the user`"

isparks commented 3 years ago

This is strange and I am sorry I have not been able to help explain it or debug it. One question, if you enter the url manually into a browser and download the file that is produced, does it have the backticks? I do not believe that rwslib could be introducing these but I would like to be sure.

If rwslib isn't introducing them then I don't think I can help more, You will have to ask Medidata. Probably removing the backticks isn't too painful? If the first character of the second line of the result starts with a ` then you know that its backtick-quote delimited, otherwise its quote delimited?

pscohy commented 3 years ago

@isparks , when doing the request in postman, I have the same issue so you are right, it is not introduce by RWSLIB.

Many thanks for your help and time on this, it is really appreciated.

Have a nice day

glow-mdsol commented 3 years ago

I'm checking with one of the experts here, suspect it's something bubbling up from SQL Server.

glow-mdsol commented 3 years ago

Is this something new or has it always been like this?

pscohy commented 3 years ago

@glow-mdsol ,

I would both of them. We have this issue since we are trying to fetch the data through Rave webservices. But we are trying to fetch the data since 2 weeks. So it was always there for us, but maybe it was not present before?

glow-mdsol commented 3 years ago

Ok, there's an internal investigation on this I think. Will let you know what is found.

pscohy commented 3 years ago

@glow-mdsol ,

Many thanks. I'm waiting for your future investigation and feedback 😄