mysociety / citizenconnect

Citizen Connect project for the NHS: reporting problems, leaving reviews
https://www.nhs.uk/careconnect/choices
Other
1 stars 0 forks source link

Cron errors: Invalid XML being pushed to the Choices API #965

Closed evdb closed 11 years ago

evdb commented 11 years ago

The pen testers have been submitting dodgy content which we are then trying to push to Choices.

I think that the issues is not our end, as none of what I inspected looks like something that would upset XML, so perhaps we are trying to push to an API that does not like the content we're sending them, perhaps expecting a different schema?

From general.yml.deployed:

NHS_CHOICES_API_KEY: 'U---------------' # redacted here, correct in actual config
NHS_CHOICES_BASE_URL: 'http://v1.syndication.nhschoicespreview.nhs.uk/'
NHS_CHOICES_POSTING_ORGANISATION_ID: 15

Cron errors are:

Cron <citizenconnect@stork> run-with-lockfile -n /data/vhost/citizenconnect.mysociety.org/push_new_reviews_to_choices.lock "/data/vhost/citizenconnect.mysociety.org/citizenconnect/bin/cron_wrapper.bash push_new_reviews_to_choices" || echo "stalled?"

1: The XML has invalid fields
2: The XML has invalid fields
3: The XML has invalid fields
4: The XML has invalid fields
5: The XML has invalid fields
6: The XML has invalid fields
... lots more ...

Looking in the DB we have entries like:

citizenconnect=> select * from reviews_submit_review where id = 1;
-[ RECORD 1 ]-------+------------------------------
id                  | 1
created             | 2013-07-10 10:11:27.077489+01
modified            | 2013-07-10 10:11:27.077531+01
email               | cha[redact]ell@activityim.com
display_name        | Acticivity test 1
is_anonymous        | t
title               | test
comment             | blah blah blah
month_year_of_visit | 2013-02-01
organisation_id     | 1
last_sent_to_api    | 

citizenconnect=> select * from reviews_submit_review where id = 50;
-[ RECORD 1 ]-------+------------------------------
id                  | 50
created             | 2013-07-10 10:31:59.088418+01
modified            | 2013-07-10 10:31:59.088467+01
email               | jon.doe@test.com
display_name        | jon doe
is_anonymous        | t
title               | '
comment             | '')waitfor delay'0:0:20'--
month_year_of_visit | 2012-01-01
organisation_id     | 1
last_sent_to_api    | 

citizenconnect=> select * from reviews_submit_review where id = 900;
-[ RECORD 1 ]-------+-----------------------------------------------------------------------
id                  | 900
created             | 2013-07-10 10:39:33.644278+01
modified            | 2013-07-10 10:39:33.644325+01
email               | jon.doe@test.com
display_name        | jon doe..\..\..\..\..\..\..\..\..\..\..\..\..\..\..\..\windows\win.ini
is_anonymous        | t
title               | '
comment             | '
month_year_of_visit | 2012-01-01
organisation_id     | 1
last_sent_to_api    | 
evdb commented 11 years ago

Ah - one thought is that having cleared the database we might be conflicting with previous posts we've made to the API, as we've restarted our ids at 1 but kept the same posting organisation id of 15?

stevenday commented 11 years ago

This is a possibility, another is that the organisations we're posting reviews about have dodgy choices ids/ods_codes too, as I just made them up for the testing?

chrismytton commented 11 years ago

It seems that the question ids we have for hospitals are incorrect. The POST request gets a 403 error (the one @evdb mentioned above) when I remove <Rating QuestionID="6">5</Rating> from the xml.

Working XML

<?xml version="1.0" encoding="utf-8"?>
<Opinions xmlns="http://v1.syndication.nhschoices.nhs.uk/Comments">

    <Opinion>
        <Author>channon.powell@activityim.com</Author>
        <DisplayName>Acticivity test 1</DisplayName>
        <Title>test</Title>
        <Comment>blah blah blah</Comment>
        <MonthYearOfVisit>02/2013</MonthYearOfVisit>
        <PostingID>1</PostingID>
        <PostingOrganisationID>15</PostingOrganisationID>
        <DateSubmitted>2013-07-10T09:11:27.077489+00:00</DateSubmitted>
        <Ratings>
            <Rating QuestionID="7">3</Rating>
            <Rating QuestionID="5">5</Rating>
            <Rating QuestionID="4">1</Rating>
            <Rating QuestionID="3">4</Rating>
            <Rating QuestionID="2">1</Rating>

        </Ratings>
    </Opinion>
</Opinions>

So I think we need to get an updated set of ratings questions and answers xml for the different organisation types that we handle then load that in to make these work.

In the meantime I'll mark these reviews as sent to stop the cron errors.

chrismytton commented 11 years ago

Question id=6 is definitely in the API, also posting question ids that don't exist also works, so it seems to be a bug with the API.

Have emailed David Pollard and Matt George about the issue.

chrismytton commented 11 years ago

For reference this is the email sent to David Pollard/Matt George on 12th July at 12:37:

Hi David/Matt,

I'm running into an error when trying to post reviews to the Choices API.

I'm currently getting a 400 status code back ("The XML has invalid
fields") when posting reviews to the API. After some digging around it
seems that the API is only returning this error when the xml includes
`<Rating QuestionID="6">5</Rating>` inside the `<Ratings>` element.

Looking at the list of questions and answers at
http://v1.syndication.nhschoices.nhs.uk/organisations/hospitals/questionsandanswers
there is a question with id=6 in that list.

I have attached a copy of the XML that we're trying to POST to
http://v1.syndication.nhschoicespreview.nhs.uk/comment/F84021.

Hopefully you can tell me where I'm going wrong!

Cheers,

Chris
chrismytton commented 11 years ago

I believe this is being handled by Richard Moore from the Choices team:

Moore Richard (CHOICES) richard.moore10@nhs.net
23rd July 15:37

Hi Chris

Can you please send me the XML for the comment post as it was not attached to your previous email?

Thanks

Richard

I have replied to Richard's email with a copy of the XML he requested.

chrismytton commented 11 years ago

From Richard Moore (23rd July 16:00)

Hi Chris

The comment was posted successfully. I believe that there were some formatting errors in the XML as I did not change any of the values

Regards

Richard

The XML was as follows:

<?xml version="1.0" encoding="UTF-8"?>
<Opinions xmlns="http://v1.syndication.nhschoices.nhs.uk/Comments">
<Opinion>
<Author>channon.powell@activityim.com</Author>
<DisplayName>Acticivity test 1</DisplayName>
<Title>test</Title>
<Comment>blah blah blah</Comment>
<MonthYearOfVisit>02/2013</MonthYearOfVisit>
<PostingID>333</PostingID>
<PostingOrganisationID>15</PostingOrganisationID>
<DateSubmitted>2013-07-10T09:11:27.077489+00:00</DateSubmitted>
<Ratings>
<Rating QuestionID="7">3</Rating>
<Rating QuestionID="6">5</Rating>
<Rating QuestionID="5">5</Rating>
<Rating QuestionID="4">1</Rating>
<Rating QuestionID="3">4</Rating>
<Rating QuestionID="2">1</Rating>
</Ratings>
</Opinion>
</Opinions>
chrismytton commented 11 years ago

I have replied to Richard's email with further details:

Hi Richard,

I've just tested this again and it's still giving me an error. When I
run the following command with the XML I sent you I receive a 400
status code (API key redacted).

curl -v -X POST -d @review.xml
'http://v1.syndication.nhschoicespreview.nhs.uk/comment/F84021?apikey=XXXXXX'
-H 'Content-Type: application/xml'

However if I edit the XML file and remove the line `<Rating
QuestionID="6">5</Rating>` then re-try, I get a 403 error (which is
what I would expect).

I hope that is clearer.

Kind regards
chrismytton commented 11 years ago

The mystery continues:

From Richard:

I tested the XML in the live environment rather than in preview using the URL below

https://v1.syndication.nhschoices.nhs.uk/comment/<NACS>?apikey=<APIKey>

None of the rating questions are mandatory so adding/deleting one should not make a difference

Try posting into live using the URL above. The moderators will delete your comment

My reply:

Still seems to be an issue on the live environment. Full output pasted below (API key redacted again).

$ curl -si -X POST -d @review.xml -H 'Content-Type: application/xml 'http://v1.syndication.nhschoices.nhs.uk/comment/F84021?apikey=XXXXXX''
HTTP/1.1 400 Bad Request
Content-Length: 719
Content-Type: text/html
X-AspNet-Version: 2.0.50727
Expires: Tue, 23 Jul 2013 15:45:15 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Tue, 23 Jul 2013 15:45:15 GMT
Connection: close
Set-Cookie: cookie=R2171090599; path=/



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
    <title>NHS Choices Syndication</title>
    <meta http-equiv="Content-Style-Type" content="text/css" />

</head>
<body>
    <div id="MainContent">
        <img style="border: 0; width: 1px; height: 1px;" alt="" src="http://statse.webtrendslive.com/dcss9yzisf9xjyg74mgbihg8p_8d2u/njs.gif?dcsuri=/&amp;wt.js=no&amp;wt.cg_n=syndication"/>

    <h1>NHS Choices Syndication - Status Code 400</h1>
    <h2>400 BadRequest</h2>
    <p>XML Payload contains missing or incorrect values.</p>

    </div>
</body>
</html>
$

Richard's reply:

Did you change the posting id? I have already used 333

My reply:

A duplicate posting ID would return a 403, which is what I would expect. The problem is I am getting a 400 "XML Payload contains missing or incorrect values.".

(The subject of the email is "400 error")

chrismytton commented 11 years ago

Richard's reply

You are getting a 400 error as you are posting a hospital comment with an incorrect ODS code – try posting with RQXM1 for Homerton. You will need to use a posting ID of 3 as I have used 1 & 2

chrismytton commented 11 years ago

My reply:

I am only getting a 400 error when the XML includes <Rating QuestionID="6">5</Rating>, when it doesn't contain that the POST works successfully. Can you reproduce this behaviour?

chrismytton commented 11 years ago

Getting there:

Richard:

No I cannot. It works fine and I am getting a 202. As I have mentioned previously, the only 400 error I am getting is when I use your ODS code. Can you please try with the ODS code I supplied?

Me:

Apologies, posting using that ODS code does indeed give me back the 403 I was hoping for, (I was thrown off the scent because the html output contains <h2>400 BadRequest</h2>!).

However according to the API documentation it should return a 404 status code when the ODS code isn't valid (e.g. if I use XXX as the ODS code it correctly returns a 404 status code).

We use the status codes that are returned from the API to determine how to handle the error, so it would be preferable for an invalid ODS code to consistently return a 404 "The NACS code is not valid" rather that a 400 "The XML has invalid fields" so we know what the problem is.

Can you let me know when this is fixed and I will test again.

:cry:

chrismytton commented 11 years ago

Latest from Richard:

I will raise a service desk ticket for the issue. However, the fix will take time to be implemented as it will not be classed as a priority.

chrismytton commented 11 years ago

Closing as this is an issue with the Choices API, not Care Connect.