Closed duck7000 closed 8 years ago
From what you've quoted, try:
preg_match_all('!<div class="sodatext">\s*<p>\s*(.*?)\s*</p>!ims',str_replace("\n"," ",$this->page["Quotes"]),$matches)
Though that might give you only the first paragraph of each quote (your quoted code doesn't show the "next element"). If you don't mind having the <p>
within:
preg_match_all('!<div class="sodatext">\s*(.*?)\s*</div>!ims',str_replace("\n"," ",$this->page["Quotes"]),$matches)
might be a better fit. (not verified, just inspired by the source given in your question)
Thanks for the quick reply
I tested with this movie: http://www.imdb.com/title/tt0450385/quotes?ref_=ttalt_ql_4 I'm sorry if i'm not detailed enough but here is the whole code
<div id="qt0396883" class="quote soda sodavote odd" ><div class="sodatext">
<p>
<a href="/name/nm0000131/?ref_=tt_trv_qu"
><span class="character">Mike Enslin</span></a>:
[<span class="fine">Olin gives Enslin the room key</span>]
Most hotels have switched to magnetics. An actual key. That's a nice touch, it's antiquey. </p><p>
<a href="/name/nm0000168/?ref_=tt_trv_qu"
><span class="character">Gerald Olin</span></a>:
We have magnetic cards also, but electronics don't seem to work in 1408. Hope you don't have a pacemaker. </p><p>
<a href="/name/nm0000131/?ref_=tt_trv_qu"
><span class="character">Mike Enslin</span></a>:
[<span class="fine">into his tape recorder</span>]
General manager claims that the phantom in room interferes... </p><p>
<a href="/name/nm0000168/?ref_=tt_trv_qu"
><span class="character">Gerald Olin</span></a>:
I have *never* used the word "phantom." </p><p>
<a href="/name/nm0000131/?ref_=tt_trv_qu"
><span class="character">Mike Enslin</span></a>:
Oh, I'm sorry. Uh, spirit? Specter? </p><p>
<a href="/name/nm0000168/?ref_=tt_trv_qu"
><span class="character">Gerald Olin</span></a>:
No, you misunderstand. Whatever's in 1408 is nothing like that. </p><p>
<a href="/name/nm0000131/?ref_=tt_trv_qu"
><span class="character">Mike Enslin</span></a>:
Then what is it? </p><p>
<a href="/name/nm0000168/?ref_=tt_trv_qu"
><span class="character">Gerald Olin</span></a>:
It's an evil fucking room. </p></div>
I will try your second solution, if that works fine i strip those p tags off in my program Thanks Ed
both preg_match_all works, but give a foreach error invalid argument. I changed the foreach to index 1 and that works fine!
however the method is replacing line breaks with space so all line breaks within each paragraph are also gone..
New lines are just for code formatting here. Especially with the second variant, where you get full HTML returned (including <P>
aragraphs, the \n
ewline characters should not matter.
It might just be a quick "work-around" for you. I no longer maintain the code (I did for 12 years, but a while ago Tom has taken over; currently I'm not even use IMDBPHP myself) – so for the final fix, better let @tboothman (Tom) decide.
Oke fair enough i'll wait for tbootman to reply
ps the first preg_match_all does indeed capture only the first paragrah, the second all inside the div for now i use the second one as a work arround
Thanks Ed
Ah, it's Ed :) Yepp, I thought so when setting up the "quick-fix regexes". Tom will do the fine-tuning and provide a real solution – while you (and others finding this issue here) at least have a work-around until then.
Ahh, I see how it is. You get the glory and I write the tests :P
That output is really dirty. Really each quote should be split into lines and which character said them and who the actor is. I'm not particularly interested in doing the grunt work on that though and it'd change the return signature.
A 'quote' from the matrix
<p>
<a href="/name/nm0000206/?ref_=tt_trv_qu"
><span class="character">Neo</span></a>:
I thought it wasn't real </p><p>
<a href="/name/nm0000401/?ref_=tt_trv_qu"
><span class="character">Morpheus</span></a>:
Your mind makes it real </p><p>
<a href="/name/nm0000206/?ref_=tt_trv_qu"
><span class="character">Neo</span></a>:
If you're killed in the matrix, you die here? </p><p>
<a href="/name/nm0000401/?ref_=tt_trv_qu"
><span class="character">Morpheus</span></a>:
The body cannot live without the mind </p>
Ahh, I see how it is. You get the glory and I write the tests :P
If you insist, I agree with that :smiley_cat:
Honestly: I just wanted to provide a "quick work-around", giving you time for a good solution. What I wrote is just a quick-and-dirty hack, and certainly needs some "fine-tuning" at least (and rather a lot of the latter, I'm afraid) …
So this is the solution for now? I agree that the output is rough though..
i think that , if it's possible, to somehow capture all paragraphs in the div's would be a nice solution? And then split the capture in name, imdbid and comment?
I understand that would be a lot of more work though
@duck7000 as I understand Tom, it means he's still working on a "good solution" and considers mine a "usable work-around, not clean enough to be accepted as solution" – as I do. I fully trust him to come up with something much better and cleaner than my "quick hack", which certainly will have a "Wow!" effect. I know why I handed over the project to Tom: he's at least as perfectionist as I am, so no worries :stuck_out_tongue_winking_eye:
Give him a little time, though – IMDB is tricky, and all of us want something that stays working a little longer. Perfectionists' work takes a bit more – but then it's, well, perfect :smiley_cat:
Well ... I've 'fixed' the method back to how it was before. I don't really think what it's producing is particularly good but that is what the method says it does.
If you want to make a method that does something more useful I wouldn't be opposed to writing some tests for it. I think I'd recommend using xmldom / xpath for this one though because it's quite nicely structured. http://www.imdb.com/title/tt0133093/quotes has quite a few variations including stage directions as its own line.
Oke fair enough, the method now works like it always did i guess, although i never used it before.. I'm upgrading my program so i thought it would be nice to get as much info from imdb as possible.
So i accept that you both wouldn't spend the time to make it better Thanks for the fix and input
This method is broken due to site changes i guess
page source looks like this:
the preg_match like this
I tried to fix it myself but my skills with preg_match are slim. obviously the class name is wrong but that is as far as i get
Thanks Ed