huginn / huginn

Create agents that monitor and act on your behalf. Your agents are standing by!
MIT License
42.94k stars 3.75k forks source link

Is there any way to get the sum of the numbers in the field such as description? #2104

Open tortoo opened 7 years ago

tortoo commented 7 years ago

I know Liquid has plus function, but the numbers in description field or other else can only plus with a special number. Is there any way to get the sum of these numbers in description field or make them calculate simply.

dsander commented 7 years ago

Hi @tortoo,

I am pretty sure there is a way, do you have an example event?

tortoo commented 7 years ago

Hi @dsander , glad to see you and thank u for helping!😸

The following is my Trigger agent:

{
  "expected_receive_period_in_days": "4",
  "keep_event": "true",
  "rules": [
    {
      "type": "regex",
      "value": "\\+",
      "path": "description"
    }
  ],
  "message": "{{description | regex_replace:'([^+\\d]|(?<!\\+)\\d)', '' }}"
}

And next is an event it output.

[
  {
    "id": "",
    "url": "https://buluo.qq.com/mobile/detail.html?bid=144106&pid=5794662-1503831151_144106_&source=share_app",
    "urls": [
      "https://buluo.qq.com/mobile/detail.html?bid=144106&pid=5794662-1503831151_144106_&source=share_app"
    ],
    "links": [
      {
        "href": "https://buluo.qq.com/mobile/detail.html?bid=144106&pid=5794662-1503831151_144106_&source=share_app"
      }
    ],
    "title": "",
    "description": "1970-01-01 08:00│💬│aa🗣+3&nbsp;哈哈│cc🗣嘿嘿。+5│aa🗣666。+2│cc🗣好。+1│cc🗣+{{10086:0}}│c🗣c+1000│aa🗣不错",
    "image": null,
    "enclosure": null,
    "authors": [

    ],
    "categories": [

    ],
    "date_published": null,
    "last_updated": null,
    "message": "+3+5+2+1++1"
  }
]

The trigger agent receives an Rss agent which uses PHP and it has been written by someone else but I don't know PHP well😹. The nickname and the content of comments output by this Rss were in the 'description' field together. If I want to get the sum of the numbers, what the only thing I can do is to remove everything but numbers with '+'. I thought doing the same job to generate RSS with huginn but I don't know JavaScript well🙈. The site I need to handle is Site , and I think the only way to catch each comment separately is to use JavaScript Agent, and it's very hard for me🙈. The following is another sample page with 20+ comments. Sample2

I think there are two ways for me. One is some method to get the sum of this: "message": "+3+5+2+1++1". Maybe it looks more simple. Another is an JavaScript Agent sample of above site. Maybe it can separate the nickname and comment and make them into different field, but even so I still don't know how to plus numbers in different field.

Said a lot 🙈. Thank you for reading them and working so hard. Best regards!

dsander commented 7 years ago

Nice job with the regular expression! That looks like a complex string to clean up. I though liquid had a sum filter, but sadly it doesn't. Because of that is gets a bit more complicated:

{% assign array = message | split: '+' | compact %}{% assign sum = 0 %}{% for number in array %}{% assign sum = sum | plus: number %}{% endfor %}{{sum}}

First we are generating an array from the string in message (splitting the string and removing the empty fields). Then the for loop iterates over the array, sums up the values and returns the sum at the end.

tortoo commented 7 years ago

I'm so flattered😹! And I'm still digesting these words you said and consulting in Liquid. It's the first time I use the {% format in Liquid and a little confused that where to put it in? I did this in my trigger agent but it didn't work🙈.

"message": " {{description | regex_replace:'([^+\d]|(?<!\+)\d)', '' }}{% assign array = message | split: '+' | compact %}{% assign sum = 0 %}{% for number in array %}{% assign sum = sum | plus: number %}{% endfor %}{{sum}}"

tortoo commented 7 years ago

Sorry @dsander it's a low-level-error and after I put it in next agent i.e. slack agent it works! It seems that I haven't understand huginn well but I'll make an effort for that. There is much information in your code and it opens a gate to use Liquid for me. I'll try to make the sum divided by comment numbers with + and get the sum of different nickname. And after that the issue will be closed.😹 Thank you very much for helping!

dsander commented 7 years ago

Nice, glad you got it working! The version you posted does not work because the message that is used the first assign tag is not cleaned yet, if you moved the regex_replace filter into the assign chain it should work in the formatter Agent as well: {% assign array = description | regex_replace:'([^+\d]|(?<!\+)\d)', '' | split: '+' | compact %} .....

To get the vote average have a look at the size and divided_by filters. You might need to assign the array size to another variable before you can use it in divided_by. If two integers are divided by each other the result will also be an integer and not a float, you can 'type cast' one of the integers by appending .0: {{ "12" | append: ".0" | divided_by: 10 }}

tortoo commented 7 years ago

I got it! After placing the regex_replace filter into the chain it worked.

I tried to get the vote average and met some problems. First I tested the size filters with the following.

"message": "{{description | regex_replace:'([^+\d]|(?<!\+)\d)', '' | split: '+' | compact | size }}"

And the output is "message": "7".

Infact it's 5 after filter it outputed "message": "35211". It seems the output shouldn't have any other strings ecxept numbers . I'm so confused😹.

The following is my average code but it told me the Liquid had an error. Also I don't know the meaning of this sentence: {% assign sum = 0 %}.

"message": "{% assign array = description | regex_replace:'([^+\d]|(?<!\+)\d)', '' | split: '+' | compact %}{% assign sum = 0 %}{% for number in array %}{% assign sum = sum | plus: number %}{% endfor %}{{sum}}{% assign average = 0 %}{% assign amount = array | size %}{% for amount in array %}{% assign average = sum | append: '.0' | divided_by: amount %}{% endfor %}{{average}}"

dsander commented 7 years ago

Infact it's 5 after filter it outputed "message": "35211". It seems the output shouldn't have any other strings ecxept numbers . I'm so confused😹.

You are right, the compact filter only removed nil (i.e. undefined) values from an array, we can remove the unwanted empty entries by filtering the string a bit more: {% assign array = description | regex_replace:'([^+\d]|(?<!\+)\d)', '' | regex_replace: '\+\+', '' | regex_replace: '\A\+', '' | split: '+' %}

I think that should give the average: {% assign array = description | regex_replace:'([^+\d]|(?<!\+)\d)', '' | regex_replace: '\+\+', '' | regex_replace: '\A\+', '' | split: '+' %}{% assign sum = 0 %}{% for number in array %}{% assign sum = sum | plus: number %}{% endfor %}{% assign amount = array | size %}{{ sum | append: '.0' | divided_by: amount }}

You do not need to iterate over the array to get the average.

tortoo commented 7 years ago

I got it! Division was used once and plussing with each other was used many times. 😸 I tested your code many times and finally found that it lost a + here: regex_replace: '\+\+', '+' , after adding it, the calculation result would be right. When 2.4 jumped out on the screen I was very excited! And my last question is if I want to get the sum of vote number from the same nickname in different topic post, what should I do? One nickname can only vote once in one topic post.

dsander commented 7 years ago

And my last question is if I want to get the sum of vote number from the same nickname in different topic post, what should I do? One nickname can only vote once in one topic post.

For that you would have multiple events (one for every post) with a list of usernames who voted on that post? I think that can only be done in a JavaScriptAgent in which you store a Hash of usernames and count the amount of times that name was included in an Event.

tortoo commented 7 years ago

The counting isn't for times but for numbers (scores). The vote scores will be taken from the user who has given it. So I need to get the sum of vote numbers of each user (not them all). According to your opinion I think the counting should start from the first event which has a vote number with a given username and then doing this every time when the vote number given by the same user appears in another event. But as time goes on it'll make the event number so huge and can only keep the event for ever. Even so I don't have a JavaScriptAgent example for this. I'm thinking if there is a way to store the number in one place including the new number added and counting them in an agent with liquid? Maybe LiquidOutputAgent or else?

dsander commented 7 years ago

The LiquidOutputAgent would work if the data was already normalized, your events would look something like this, right?

{ name: 'username1', score: 1, page: 'something' }
{ name: 'username2', score: 4, page: 'something' }
{ name: 'username1', score: 3, page: 'something else' }

I don't think it is possible group the scores by username just by using liquid.

The JavaScriptAgent can store any data in it's memory, so you could take the incoming events and create a memory entry for every username and store their scores. When the Agents is checked (run on schedule) it could emit the score per username.

tortoo commented 7 years ago

You mean I should make the username and score apart and into two fields first? Does that mean the rss data should be got with JavaScriptAgent ? I don't know how to use JavaScriptAgent and know little about JavaScript. I'm prevented on this step...😭 I can get the comment content with Phantom Js Cloud Agent but it's only at most 20 comments on the page and the name and content are together. I searched a lot but didn't find an example for fetching the react website. @

dsander commented 7 years ago

Are you passing the HTML the PhantomJsCloudAgent extracted to a WebsiteAgent? You should be able to extract elements from the HTML like you normally would if the data was fetched by the WebsiteAgent. I don't think it is possible to interact with a react website.

tortoo commented 7 years ago

Yes I've already done that. The following is PhantomJsCloudAgent on another site with the same structure. The site is a react website written by Tencent Company. { "mode": "merge", "api_key": "{% credential PhantomJs Cloud API key %}", "url": "{{url2}}", "render_type": "plainText", "output_as_json_radio": "false", "output_as_json": "false", "ignore_images_radio": "true", "ignore_images": "true", "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36", "wait_interval": "5000" }
(url2 is the original URL)

The following is WebsiteAgent: { "expected_update_period_in_days": "4", "url": "{{url}}", "type": "text", "mode": "merge", "extract": { "my_recommend_saying": { "regexp": "大酋长\\n((.*?)❣加精~(.*?))\\n", "index": "1" } }, "template": { "content_full2": ".\\r {{my_recommend_saying}}{{content_full | regex_replace:'src=\"\\/\\/gpic', '>https://gpic' | regex_replace:'\\/1000\"', '/1000<b' | regex_replace:'<br> |<br> |<br>', '\\r\\r ' | regex_replace:'<p[^>]+>', '\\r\\r ' | regex_replace:'\\n<div class=\"content\"', '\\r\\r <' | regex_replace:'<div class=\"content rich', '\\r\\r <' | regex_replace:'<\\\\?/?[^>]+>', '' | regex_replace:'(((https|http)?:\\/\\/)[^\\s]+/1000)', '<\\1|👻 Pic>' | regex_replace:'\\n\\n', '\\n' | regex_replace:'\\r \\n', '\\r\\r ' }}" } }
(The PhantomJscloud URL)

And if I give the recommend comment after the 20th floor the content will be in the next page and the PhantomJsCloudAgent can't fetch it.

Sorry I remind of that I used the plain text type to make the PhantomJsCloud url to reduce the usage🙈. If I use html type I can get the nickname and comment content separately and into two fields. I'll try to do that later. But how to solve the problem of only showing 20 comments in the PhantomJsCloud page? In PHP it can show any numbers of comments I give but the names and comments are in the same field.

"description": "1970-01-01 08:00│💬│username1🗣+3 哈哈│username2🗣嘿嘿。+5│username3🗣666。+2│username4🗣好。+1│username5🗣+1000│username6🗣不错"

Can this above be made to this below by any way?

{ name: 'username1', score: 1, page: 'something' } { name: 'username2', score: 4, page: 'something' } { name: 'username1', score: 3, page: 'something else' }

tortoo commented 7 years ago

This is the output for the The PhantomJscloud URL

[ { "name": "aa", "comment": "+2" }, { "name": "bb", "comment": "+5" }, { "name": "cc", "comment": "+1" }, { "name": "cc", "comment": "+3" }, { "name": "cc", "comment": "" }, { "name": "cc", "comment": "" }, { "name": "cc", "comment": "" }, { "name": "cc", "comment": "+8" }, { "name": "cc", "comment": "+6" }, { "name": "bb", "comment": "+4" }, { "name": "bb", "comment": "" }, { "name": "aa", "comment": "+3" }, { "name": "aa", "comment": "" }, { "name": "aa", "comment": "+7" }, { "name": "aa", "comment": "++" }, { "name": "aa", "comment": "++5" }, { "name": "cc", "comment": "+9" }, { "name": "cc", "comment": "" }, { "name": "cc", "comment": "+3" }, { "name": "cc", "comment": "+3" } ]

It only gets 20 fields about name and comment. The PhantomJsCloud page can't get the 21st and 22nd . They are: cc: +5 aa: +4 The page url

dsander commented 7 years ago

Like said before the PhantomJsCloudAgent can not interact with a website, the next comments are only shown when you click on the 'next page' button. The service has some advanced examples which allow to execute some javascript on the page but that is not supported by our Agent.

The site seems to load the data with ajax requests. You can inspect those by using the chrome or firefox developer tools and go to the network tab you can see the requests.

Using a WebsiteAgent in json mode I am able to get more than 20 comments:

{
  "expected_update_period_in_days": "2",
  "url": "https://buluo.qq.com/cgi-bin/bar/post/get_comment_by_page_v2?bid=144106&pid=5794662-1504603171_144106_&num=100&start=0&barlevel=1&r=0.7338251183492299&bkn=",
  "type": "json",
  "mode": "on_change",
  "extract": {
    "data": {
      "path": "$."
    }
  },
  "headers": {
    "referer": "https://buluo.qq.com/p/detail.html?bid=144106&pid=5794662-1504603171_144106_"
  }
}
tortoo commented 7 years ago

I got it! It's sooooo helpfull to use Json mode to fetch the web! It's my first time to use it and after looking up on the example page I got the 23 comments smoothly! Now the PhantomJsCloudAgent is put away😹. My code is: "path": $.result.comments[*].user.nick_name and "path": "$.result.comments[*].comment.content". And the next step, how to get the sum of scores by the same username in all different events with JavaScriptAgent?

dsander commented 7 years ago

I think the simplest approach would be to write a memory key for every user and store the score in there. When new events are received read the score from the memory using the users key increase it and save it back.

tortoo commented 7 years ago

I'm still searching for some similar code and trying to reform it. I got this and it's so complicated. Maybe I need some simple example to dry run and for practice.😞

dsander commented 7 years ago

Start with the sample receive function in a new JavaScriptAgent and send it one of the events it is supposed to handle with the dry run function. this.log('some message') is helpful to debug your code (the log messages will be shown in the dry run modal). When you got the part where you set the Agents memory with this.memory(keyToSet, valueToSet) working you need to run the Agent with real events because the memory does not persist in dry-runs.

tortoo commented 7 years ago

It's very helpful and I'm very thankful! I'm trying to learn some basic knowledge about JavaScript😹 and it cost some time. And then I'll try to write some simple code and test it.😊